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OPERATOR  WORKLOAD :  COMPREHENSIVE  REVIEW  AND  EVALUATION  OP  OPERATOR  WORKLOAD 
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£ 


MW 


Requirement: 

The  overall  purpose  of  this  report  is  to  provide  useful  and  practical 
information  concerning  operator  workload  (OWL).  It  is  specifically  aimed  at 
information  applicable  in  conceptualising,  specifying,  designing,  developing 
or  evaluating  systems  for  the  Army. 


Procedure i 

Relevant  research  and  published  materials  vere  identified  and  obtained 
through  libraries  and  personal  contact.  The  literature  obtained  was  reviewed 
for  specific  OWL  techniques.  A  workload  technique  taxonomy  served  as  the  or¬ 
ganizational  scheme  within  which  the  workload  literature  was  reviewed.  Organ¬ 
izations  engaged  in  significant  workload  research  were  visited  to  discuss 
current  and  ongoing  OWL  research. 


Findings: 

Operator  Workload  is  explained  and  defined  using  several  informal  exam¬ 
ples,  definitions  of  vorkload  used  by  researchers  as  reported  in  the  litera¬ 
ture,  the  foundation  of  a  general  definition,  a  framework  (taxonomy)  to 
organize  the  various  vorkload  estimation  techniques,  and  some  general  issues 
concerning  the  techniques.  After  considering  a  variety  of  performance  issues 
and  definitions  in  the  literature,  the  idea  of  a  performance  envelope  is  de¬ 
veloped.  Vorkload  determines  the  current  position  in  the  envelope.  The  pri¬ 
mary  Interest  is  the  operator’s  position  relative  to  the  boundaries  of  the 
envelope  and  the  operator’s  relative  capacity  to  respond. 

Techniques  that  have  been  used  for  assessing  OWL  and  determining  the  op¬ 
erator’s  current  and  future  position  in  the  performance  envelope  are  reviewed 
and  analyzed.  These  techniques  are  classified  into  two  broad  categories: 

•  Analytical- -predictive  techniques  that  may  be  applied  early  in  system 
design  without  an  operator-in-the-loop,  and 

•  Empirical- -operator  vorkload  assessments  that  ere  taken  with  an 
operator-ln-the-loop  during  simulator,  prototype,  or  system 
evaluations. 

The  analytical  techniques  can  be  used  early  in  system  design  when  there 
is  greatest  design  flexibility  and  throughout  the  materiel  acquisition  proc¬ 
ess.  The  analytical  category  includes  comparison  techniques,  mathematical 


vii 


modal  si,  expert  opinion,  task  analyses,  and  simulation.  Considerable  progress 
has  been  made  lit  developing  workable  analytical  tools  but  much  remains  to  be 
done . 


Empirical  techniques  are  used  when  operators  and  a  simulator,  a  proto¬ 
type,  or  a  system  are  available  for  testing.  The  empirical  category  includes 
primary  task  measures,  subjective  methods,  secondary  task  techniques,  and 
physiological  techniques.  Each  of  these  subcategories  is  discussed  in  sepa¬ 
rate  chapters.  Descriptions  of  the  methods  and  techniques  are  provided,  along 
with  discussion  concerning  available  Information  about  their  validity,  relia¬ 
bility,  sensitivity,  diagnosticity ,  intrusiveness,  and  practicality.  Recom¬ 
mendations  for  application  are  included  with  the  discussion  of  individual 
techniques . 


Utilization  of  Findings: 

The  information  from  the  reviews  is  integrated  into  a  foundation  for  a 
practical  guide  for  the  user.  Example  case  studies  are  provided,  along  with 
suggestions  for  the  most  appropriate  techniques,  both  r.nalytlcal  and  empiri¬ 
cal,  to  use  for  various  system- resource  characteristics.  A  working  guide  is 
also  provided  for  a  general  approach  to  the  selection  and  application  of 
workload  techniques.  This  application  guide  encompasses  all  major  Issues. 
Twenty- three  general  questions  are  developed  to  assist  in  identifying  the 
proper  techniques.  The  answers  provided  by  the  user  aid  in  the  selection  and 
application  of  techniques.  These  include  general  questions  about  stage  of 
system  development,  category  of  system,  and  resources  available- -both  person¬ 
nel  and  equipment,  and  a  number  of  specific  questions  about  workload. 
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Workload  Msthodotogtae 
CKkPTER 1.  INTRODUCTION 


"The  human  factors  in  most  practtosl  situation*  navs  assn  nsglsctsd  largely  bscsu**  of 
oonsotousnsss  of  Ignofanos  and  our  inabMy  to  control  thorn  (human  factors].  Whereas  t  tymeers 
deal  constantly  with  physical  problems  of  quality,  capacity,  straca,  and  strain,  they  have  tsndsd  to 
think  of  probfams  of  human  conduct  and  experience  as  unsolved  or  Insoluble.  At  the  same  time 
there  has  existed  a  growing  consciousness  of  the  practical  significance  of  those  human  factors 
and  of  the  importance  of  such  systematic  reeearch  as  shall  extend  our  knowledge  of  them  end 
increase  our  directive  power. 

'The  great  war  from  which  wa  are  now  emerging  into  a  civilization  in  many  respects  now  has 
already  worked  marvelous  changes  in  our  points  of  view,  our  expectations  and  practical  demands. 

Never  before  in  the  history  of  civilization  was  brat >,  as  contrasted  with  brawn,  so  important;  never 
before,  the  proper  placement  and  utHteation  of  brain  power  so  essential  to  success. 

'Reprinted  in  part  from  a  Harvey  lecture  delivered  by  Major  Robert  M  Yerkes  in 
New  York.  January  25, 1918,  and  published  with  the  approval  of  the  Surgeon  General  of 
the  Anny,  from  the  Section  of  Psychology  of  the  Medical  Department."  In  turn,  reprinted 
from:  Yoakum,  C.  S.  &  Yerkes,  R.  M.  (Eds.)  (1920).  Army  Mental  Tests.  New  York:  Henry 
Holt  and  Company. 

There  are  several  noteworthy  points  nbout  this  quote  from  1919.  First,  many  problems  about  "human 
conduct"  can  now  be  solved.  Techniques  have  Dean  developed  in  the  last  seven  decades  which  are 
applicable  to  these  problems  and  furthermore,  engineers  are  using  the  results  of  these  techniques  as 
design  principles  This  report  is  a  testa  mem  to  these  techniques.  However,  just  as  evolving  technology 
produces  better  tnd  more  sophisticate cJ  hardware,  technology  will  evolve  to  produce  even  better  and 
more  sophisticated  assessment  techniques.  Second,  the  trend  for  "brain,  as  contrasted  with  brawn,"  has 
accelerated.  Indeed,  the  buk  of  the  workload  literature  deals  with  brain  and  not  brawn. 

The  Oianging  Role  of  Army  Operators 


T  echnology  is  becoming  increasingly  advanced  and  complex.  As  new  systems  are  developed,  new 
techno  topi  as  are  empkyed,  and  the  rote  of  the  operator  is  changed.  The  newest  generation  of  advanced 
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military  systems  uses  advanced  computer  technology  tor  multifunction  tfsplays,  decision  aids,  Intelligent 
systems,  or  oomputationaliy-assJsted  control.  Technological  advances  have  resulted  in  changes  to 
operational  procedures  and  the  functions  of  the  system  operators.  Operators  perform  more  planning, 
supervisory,  monitoring  and  overseeing  functions  than  In  the  past.  In  many  instances  computers  are 
doing  the  computational  work  and  the  operators  are  continually  checking  for  system  failures  or  emergency 
conditions.  It  seems  fair  to  characterize  the  changes  In  operator  functions  as  more  mental  or  cognitive  in 
nature.  Furthermore,  operators  are  often  required  to  perform  these  (unctions  in  stressful  and  physically 
demanding  environments. 

A  plausible  scenario  could  have  an  operator  sitting  in  front  of  one  or  more  computer  displays.  The 
displays  contain  information  which  must  be  processed  and  acted  upon.  Several  potential  targets  are 
displayed  and  the  operator  must  decide  which,  H  any,  should  be  fired  upon  and  with  what  priority.  In  this 
scenario,  the  operator  is  one  member  of  a  crew  who  Is  expected  to  perform  both  night  and  day,  even 
when  fatigued.  Some  functions  can  be  shared  among  the  crew  members,  others  can  not.  The  amount 
and  rate  ot  displayed  information  is  high;  communications  channels  are  ooert  and  busy;  decisions  must  be 
made  within  seconds,  in  this  situation,  the  single  operator  or  crew  may  not  be  able  to  perform  the  required 
tasks  within  the  critical  time  window.  This  situation  may  lead  to  operator  overload  resulting  in  performance 
degradation  AND  mission  failure.  This  generic  scenario  is  applicable  to  many  emerging  combat  systems 
and  this  report  is  concerned  with  one  specific  part  of  this  problem:  Operator  Workload  (OWL). 

Current  Statue  of  OWL  In  Ef*  Army 

MANPRINT  is  an  Army  initiative  which  considers  the  role  of  the  soldi er  in  system  performance.  Through 
this  initiative,  the  Army  addresses  the  question,  Can  this  soldier,  with  this  training,  perform  these  tacks  to 
these  standards  under  these  conditions?  The  Army  MANPRINT  guidance  is  contained  in  Army 
Regulation  (AR)  602-2,  Manpower  and  Personnel  Integration  (MANPRINT)  in  the  Materiel  Acquisition 
Process  (U  S.  Army,  1987).  it  is  clear  that  AR  602-2  requires  MANPRINT  issues  be  addressed,  and 
hence  that  human  performance  data  be  obtained  and  analyzed  at  all  stages  of  the  Materiel  Acquisition 
Process  (MAP).  It  is  also  increasingly  apparent  that  the  MAP  often  does  not  allow  consideration  for  the 
possible  effects  of  exceeding  OWL  capacity.  Due  to  changes  in  technology,  cognitive  overload  is  more 
likely  than  in  the  past  and  this  oognitive  overload  can  easily  induce  operator  errors  and  cause  critical 
information  to  be  processed  incorrectly  or  missed  entirely,  leading  to  a  degradation  of  system 
performance. 

While  it  can  be  argued  that  OWL  concerns  are  not  synonymous  with  MANPRINT,  OWL  is  related  to  the 
six  MANPRINT  domains.  These  domains  are: 
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Manpower 


•  Personnel 

•  Training 

•  Human  Factors  Engineering 

•  Safety 

•  Health  Hazards 

Consideration  of  the  interrelations  between  OWL  and  the  MANPRINT  domains  will  assist  in  identifying 
MANPRINT  trade-offs  that  may  be  made  in  an  effort  to  maximize  system  performance.  For  sx  imple, 
economic  pressures  to  reduce  crew  sizes  (manpower)  has  immediate  impact  on  operator  workload.  As 
new  devices  are  added  to  replace  humans,  the  workload  of  the  reduced  crew  certainty  changes  ar  d  the 
perceptual  and  mental  workload  of  individual  operators  may  actually  Increase.  This,  in  turn,  has  impact  on 
the  interrelation  between  OWL  and  personnel  Issues  which  involves  trade-offs  between  soldier  qual  ty  (as 
measured  by  the  Armed  Services  Vocational  Aptitude  Battery  (ASVAB])  and  the  degree  to  which  t  oldier 
perceptual,  mental,  and  psychomotor  loading  occur.  Further,  workload  may  vary  due  to  training,  s  oldier 
quality,  soldier-machino  interface  and  the  degree  of  soldier  information  loading.  Knowledge  of  the  OWL- 
related  requirements  may  assist  in  better,  more  efficient  personnel  and  training  actions.  The  MANPRINT 
domains  are  overlapping,  and  because  of  this  a  change  in  one  domain  will  have  an  influence  on  others 
including  OWL.  Clearly,  MANPRINT  and  workload  concerns  are  Interrelated. 

A  requirement  has  been  established  that  OWL  issues  need  to  be  addressed  at  all  stages  or  the  MAP. 
The  regulation  AR  602-1,  Human  Factors  Engineering  Program  (U.S.  Army,  1983),  specifies  that  the 
Human  Factors  Engineering  (HFE)  program  shall  be  parformed  in  accordance  with  MIL-H-46855B,  Military 
Specification:  Human  Engineering  Requirements  for  Military  Systems,  Equipment  and  Faciiitiei  (U.S. 
Army.  1979).  This  latter  military  specification  (Section  3.2. 1.3.3)  requires  that  individual  and  crew  w  orkload 
analyses  shall  be  performed  and  compared  with  performance  criteria.  However,  no  guidance  is  pnovided 
to  the  system  developer  as  to  how  such  a  workload  analysis  should  be  performed  (Hill  &  Bulger  1988). 
This  lack  of  guidance  has  led  to  the  effort  which  comprises  the  body  of  this  report. 

Purpose  of  the  Report 

A  goal  of  this  report  is  to  present  a  review  of  currently  available  methodologies  and  techniques  that 
have  been  developed  and  used  in  the  assessment  of  OWL.  In  this  effort,  more  than  1500  reports  were 
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reviewed  and  close  to  500  research  reports  are  cited.  This  review  was  Intended  as  a  critique  of  the 
methods  and  techniques  that  have  previously  been  used  to  examine  workload.  It  contains  descriptions  of 
the  methodologies  and  techniques  as  well  as  discussions  concerning  the  available  Information  regarding 
validity,  relabffliy,  sensitivity,  intrusiveness,  and  practicality.  In  addition  to  methods  and  techniques  that 
have  previously  been  used  to  assess  workload,  other  methods  are  also  Identified  that  may  be  applicable 
to  OWL. 

A  second  equally  important  goal  of  this  report  Is  to  analyze  and  integrate  these  methodologies  into  a 
practical  guide  for  the  user.  Thus,  the  reviewed  techniques  are  analyzed  with  respect  to  reported 
effectiveness  and  resources  needed  for  implementation.  References  guide  the  reader  to  sources  tor 
additional  information.  In  addition,  sample  appficatlons  are  considered  in  Chapter  8.  The  overall  purpose 
ot  this  report  is  to  provide  usetul  and  practical  information  concerning  OWL  to  those  Involved  in 
conceptualizing,  specifying,  designing,  developing,  and  evaluating  systems  for  the  Army. 

Methodology  lleed  In  thia  Report 

The  approach  of  this  comprehensive  review  of  OWL  research  and  methodologies  had  two  major 
thrusts.  The  first  was  to  provide  a  technical  review  and  analysis  of  available  literature  related  to  OWL.  The 
second  thrust  was  to  be  aware  of  the  practical  utiBty  and  Importance  ot  OWL  issues  to  the  Army.  In  recent 
years,  OWL  has  received  considerable  research  attention  reflecting  Its  Importance,  and  efforts  continue  to 
understand  theoretical  as  well  as  application  issues.  The  practical  ways  in  which  workload  issues  could 
impact  system  performance,  conceptualization,  design,  development,  and  evaluation  were  considered  at 
aH  times. 

Rmrkw  Apptmch 


Relevant  research  and  published  materials  were  identified  and  obtained  through  libraries  and  personal 
contact.  The  Iterature  obtained  was  reviewed  for  specific  OWL  techniques.  The  workload  technique 
taxonomy,  described  in  Chapter  2,  served  as  the  organizational  scheme  within  which  the  workload 
literature  was  reviewed.  Organizations  engaged  in  significant  workload  research  were  visited  to  discuss 
current  and  on-going  OWL  research.  These  included  Douglas  Aircraft  Company;  NASA-Ames  Research 
Center;  Wright-Patterson  Air  Force  Base;  USAHEL;  and  NASA-Langley  Research  Center. 

The  usefulness  of  the  various  techniques  tor  addressing  Army  needs  was  the  focus  of  the  project. 
Particular  emphasis  was  placed  on  the  sensitivity  ot  the  OWL  techniques  for  measuring  differences  in 
various  tasks.  In  addition,  other  important  practical  criteria  that  received  particular  consideration  are  the 
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intruslveness  of  the  techniques  and  the  relative  costa  and  the  level  of  expertise  needed  tor  their  use. 

Descriptions  ci  the  techniques  and  discussion  concerning  their  Implementation  In  Army  applications  Hre 

given  throughout  the  volume. 


Organization  of  the  Report 

7  his  report  presents  a  review  and  synthesis  of  literature  related  to  operator  workload.  Each  chapter 
begins  with  a  brief  discussion  of  the  purpose  of  the  techniques.  In  the  body  of  the  chapter,  definitions, 
details,  and  examples  of  the  techniques  are  given  att  well  as  research  concerning  the  specific  techniques . 
Included  in  these  discussions  are  comments  about  Issues  concerning  salient  characteristics  which  will  be 
defined  later;  these  include  the  issues  of  sensitivity,  diagnostic^,  intrusion,  validity,  reliability, 
implementation,  operator  acceptance,  and  relative  cost  of  use  of  key  techniques.  These  criteria  were 
chosen  as  important  to  practitioners  and  as  appropriate  to  characterize  the  methods. 

In  Chapter  2,  basic  issues  concerning  OWL  are  discussed,  including  the  definition  of  operator 
workload,  a  taxonomy  of  workload  assessment  methods  and  techniques,  as  well  as  iher  important 
general  OWL  issues.  Subsequently,  the  descriptions  and  discussions  related  to  specific  workload 
techniques  and  methodologies  are  presented  in  Chapter  3  for  Analytical  Techniques  and  Chapters  4 
through  7  for  Empirical  Techniques.  The  organization  of  these  Hive  chapters  follows  the  organization  of 
the  taxonomy  to  be  presented  in  Chapter  2.  Chapter  8  describes  an  approach  for  the  selection  of 
appropriate  techniques  tor  assessing  workload.  Finally,  a  concluding  and  summary  Chapter  is  provided 
including  a  different  perspective  on  OWL  and  indications  of  some  future  directions. 
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CHAPTER  2.  THE  COffCGPT  OF  WORKLOAD 


Wheth  Workload? 

This  chapter  provides  a  discussion  of  the  concept  of  workload  ani^some  definitions.  It  provides  the 
forma!  background  for  the  reviews  and  evaluations  of  the  specific  workload  assessment  techniques 
discussed  in  subsequent  chapters.  However,  in  order  for  the  reader  to  Ifiave  an  Intuitive  understanding  of 
operator  workload,  several  examples  are  presented  first.  \ 

\ 

\, 

An  Exmrpte:  Driving  ' 

• 

As  an  illustration  of  what  is  meant  by  the  term  workload,  Imagine  you  are  driving  In  your  favorite  car.  As 
you  go  through  this  mehtaTexercise,  we  will  increase  the  difficulty  with  each  successive  statement  in  a 
number  of  different  ways.  Additionally,  we  will  use  some  words  like  stress  and  effort  in  a  colloquial  manner; 
these  will  be  defined  more  precisely  later.  When  we  are  through  with  the  exercise  you  may  not  know 
exactly  what  workload  is,  but  you  will  have  a  feel  for  the  range  of  operator  workload  possible  for  a  task  as 
common  as  driving  a  car.  And  more  importantly,  you  will  have  a  feel  for  the  importance  of  workload  and  the 
various  factors  that  affect  workload.  One  point  we  wish  to  make  In  this  example  is  that  workload  Is  not  the 
same  as  performance. 


•  Since  you  are  an  important  personage,  the  State  Police  have  closed  the  Interstate  to 
all  other  drivers.  You  are  cruising  down  the  highway  at  the  speed  limit  on  a  nice, 
sunny  day  .  Easy  driving,  right? 

•  You  have  just  passed  the  state  line.  This  second  state  doesn't  think  you  are  quite  as 
important  and  now  you  have  some  traffic.  Still  not  bad.  ^ 

•  You  have  been  driving  for  a  while,  It  Is  approaching  the  rush  hour  near  a  metropolitan 
area  and  traffic  is  picking  up. 


•  It  is  Friday  afternoon  and  every  one  wants  to  get  home  or  out  of  town  before  the  storm 
hits.  Traffic  is  now  much  heavier  than  normal  and  slowing  down.  (We  must  be  in 
Connecticut.) 

•  You  loft  early  this  morning  and  didnt  realize  you  hadn't  stopped  for  lunch.  You're 
tired  and  hungry. 

•  Traffic  Is  now  reduced  to  a  crawl.  You  also  forgot  to  get  gas  when  you  forgot  lunch. 
You've  got  to  got  to  an  exit  and  find  a  gas  station. 

•  While  you  are  crawling  along,  the  weather  has  fumed.  It  is  now  raining. 


Preceding  Page  Blank 


•  It  has  also  gotten  dart  and  visibility  Is  not  good.  The  highway  Is  not  well  marted  and 
yen  must  be  careful  not  to  miss  your  turnoff. 

•  Worse,  trie  car  in  front  does  not  have  brake  lights  so  you  have  to  pay  very  close 
attention  to  this  stop-and-go  stuff.  Eyes  on  car  In  front. 

•  A  few  miles  are  covered,  but  with  the  dart,  the  outside  temperature  has  also  dropped. 

It  is  no  longer  just  raining,  It  Is  freezing.  Several  cars  are  off  the  road.  Still  bumper  to 
bump?*-  and  gas  Is  getting  vety  low. 

•  Your  two  year  old,  who  was  sleeping  in  the  backseat,  wakes  up.  He  is  hungry,  scared, 
and  crying. 

•  It's  not  a  lot  of  fun  with  all  that  Is  going  on.  In  addition,  the  engine  sounds  like  It  is 
missing  and  you  know  you  era  not  yet  quite  out  of  gas.  (You've  turned  the  radio  down 
and  would  Ike  to  turn  the  kid  down.) 

You  are  about  to  'lose  It'  as  anyone  who  has  been  in  a  similar  situation  can  attest,  improbable,  yes,  but  not 
impossible.  (And  note,  we  didn't  cheat  by  giving  ycu  an  unfamiliar  vehicle  with  shift  Instead  of  automatic, 
or  even  an  English  car  with  the  wheel  on  the  *wrong  side.'  We  assumed  that  your  prior  training  and 
experience  was  in  effect.)  Further,  we  didn't  even  have  hosiiles  shooting  at  you.  Nor  did  we  have  you 
crash  -  •  Performance  remained  acceptable. 


A  Second  Example:  Mental  Load 


Before  we  start  discussing  workload  in  a  formal  way,  we  want  to  consider  one  more  example,  this  time 
strictly  mental  toad.  First,  we  are  going  to  ask  you  to  do  a  couple  of  tasks  that  are  highly  overleamed  and 
very  easy.  Then  we  will  do  ttie  tasks  again,  but  in  a  combined  manner.  Not  only  does  the  demonstration 
illustrate  an  example  of  cognitive  workload,  it  illustrates  an  important  point  about  measung  workload:  Two 
easy  tasks  added  together  can  sometimes  result  in  a  very  dHficult  task.  Not  an  easy  situation  to  predict.  As 
you  do  the  task,  take  your  time.  You  might  even  want  to  time  yourself  on  each  of  the  parts 

•  Recite  the  alphabet, 

•  Count  from  1  to  26, 

•  Now  do  both,  interleaving  the  alphabet  with  the  counting,  A-1 ,  B-2,  etc.  saying  the 
answers. 

It  you  actually  got  all  the  way  through  the  combined  task,  you  are  unusual.  Most  people  give  up  about  G-7 
or  H-8.  Why  is  it  so  difficult?  Let  us  use  this  example  to  diagnose  the  basis  of  the  difficulty  and  illustrate 
workload  analysis.  Get  out  a  pencil  and  a  piece  of  paper.  Do  the  double  task  agr'n,  this  time  writing  down 
the  answers.  Any  difficulty  in  getting  all  the  way  through  this  time?  Part  of  the  difference  between  the  two 
is  that  the  pencil  and  paper  reduces  the  heavy  burden  on  memory.  There  are  some  additional  reasons, 
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but  the  point  Is  that  tho  same  task  can  be  difficult  or  relatively  easy  depending  on  how  we  do  it.  And  we 
can  identify  the  reasons  for  the  differences.  In  this  example,  there  Is  usually  a  performance  failure  on  the 
first  attempt  which  burdens  momory  and  success  on  the  second  attempt  which  uses  pencil  and  paper  -  - 
Performance  Is  acceptable  only  In  the  second  cm*. 

These  two  oxamples  should  give  you  an  idea  about  the  variation  of  difficulty  of  tasks  and  the  difference 
between  measuring  performance  and  the  amount  of  effort  you  have  to  expend  to  perform  the  task. 

In  this  chapter,  we  consider  a  number  of  general  issues  involving  workload.  Later  chapters  will  cover 
more  specific,  detailed  issues.  First  ws  present  a  description  of  the  relation  of  performance  and  workload. 
This  leads  to  a  discussion  of  some  human  performance  concepts  in  the  context  of  system  operation.  To  a 
large  degree,  this  discussion  is  the  foundation  of  oil  that  is  to  come  later.  Then,  the  chapter  provides  a 
review  and  discussion  of  definitions  of  OWL.  Also  included  is  an  organization  of  workload  assessment 
techniques  in  the  form  ot  a  taxonomy  that  provides  a  structure  within  which  to  classify  the  measures. 

Perfonranoe  vs.  Workload 

Performance  is  what  we  are  ultimately  concerned  with,  Can  the  operator  successfully  complete  the 
mission?  One  goal  of  workload  research  is  to  predict  impending  doom  •  failure  cf  performance.  Not  only 
do  we  not  want  the  mission  to  fail,  we  also  do  not  want  the  man  or  machine  to  be  damaged.  Having 
anticipated  and  predicted  a  trouble  spot,  the  second  goal  is  to  correct  those  situations  in  which 
performance  fails.  As  an  aid  in  this  effort  toward  better  and  safer  performance,  researchers  have 
developed  the  concept  of  workload. 

The  relation  between  workload  and  performance  is  illustrated  in  Figure  2-1 .  In  the  figure,  it  can  be  seen 
that  workload  and  performance  seem  to  have  an  inverted  U  relation.  At  extremely  low  levels  of  workload  as 
in  Region  1,  the  operator  may  become  bored  (Hart,  1986a).  Boredom  can  load  to  missed  signals  and 
instructions,  resulting  in  poor  performance  (Parasuraman,  1986).  (Although  this  report  will  not  address 
cases  subsumed  in  Region  n ,  it  is  well  to  note  that  performance  can  be  adversely  affected  if  OWL  is  too 
low  as  well  as  too  high.)  With  a  reasonable  level  of  workload,  performance  can  be  expected  to  be 
acceptable  as  shown  in  Region  2.  However,  further  increases  of  workload  into  Region  3  show  a  marked 
degradation  in  performance.  Figure  2-1  also  illustrates  that  workload  is  not  the  same  as  performance. 
Performance  may  remain  at  an  acceptable  level  over  a  considerable  range  of  workload  variation  as  in  the 
driving  example.  In  general,  however,  workload  extremes  are  related  to  poor  performance. 
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mtmrnl  and  External  Partotmanca  Umtations 


There  is  general  agreement  about  many  of  the  determinants  of  good  or  poor  operator  performance. 
Norman  and  Bobrow  (1975),  tor  example,  differentiate  between  two  categories  of  limitations  on 
performance:  data-limited  and  resource-limited.  Data  limitations  occur  when  task  processing  is 
constrained  by  unavailable  data,  e.g.,  trying  to  read  a  map  in  the  dark.  This  is  a  limitation  external  to  the 
operator;  stimuli  may  be  below  threshold  or  may  oontaln  Insufficient  Information  to  solve  the  problem.  By 
contrast,  resource  limitations  occur  when  the  human  information  processing  system  cannot  handle  the 
data  rapidly  enough.  In  this  case,  performance  decrements  are  due  to  internal  limitations.  In  either  case, 
performance  decrements  can  be  observed  in  several  forms,  gradual,  Intermittent,  or  catastrophic.  One  of 
the  goals  of  OWL  research  is  to  uncover,  identify,  end  eliminate  those  instances  In  which  the  domands  of 
human  tasks  would  degrade  human  and  system  performance. 
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Figure  2*1.  The  hypothetical  relationship  between  workload  and  performance.  (This  figure  is  a 
compilation  of  the  concept  discussed  in  several  places  [e.g.,  Hart,  1986a;  O'Donnell  &  Eggemeier,  1986; 
Tole,  Stephens.  Harris,  &  Ephrath,  1982]). 
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Presumably,  the  data-fmited  decrements  should  be  elmkiated  in  the  design  phase  of  a  system.  Dials 
and  gauges  should  be  easy  to  read;  communications  should  be  easy  to  understand,  and  all  Key  data 
accessible.  However,  to  the  extent  that  necessary  Information  Is  simply  not  available  during  a  mission,  the 
operator  must  seek  the  Information  from  other  sources  or  spend  additional  time  estimating  parameters 
ne«oud  lor  decision  making.  Title  Illustrates  the  important  point  that  the  operator,  the  system  hardware, 
and  the  environment  all  Interact  In  affecting  performance  and  this  Interaction  can  change  the  nature  of  the 
task.  The  form  of  the  Interaction  can  also  have  Important  consequences  for  mission  performance. 

A  Mocbt  of  the  OWL  Context  Factors  Affecting  Performance  and  OWL 

The  previous  discussion  illustrated  one  way  of  looking  at  performance  limitations  and  interaction  of  the 
human  with  the  environment.  Because  human  behavior  is  dynamic,  such  interactions  abound  -  *  much  to 
the  frustration  of  the  workload  resea/che:  To  help  ine  reader  understand  the  intricacies  of  behavior, 
performance,  and  workload,  a  brief  discussion  of  the  variety  of  Influences  on  the  operator  Is  presented. 

Performance  is  affected  by  two  major  kinds  of  factors:  (a)  the  operator  tasks  defined  by  the  mission,  by 
the  environment,  and  by  the  de  sign  of  the  workstation  and  (b)  the  transitory  states  and  stable  traits  of  the 
human  operator.  Figure  2-2  illustrates  these  factors,  all  of  which  combine  to  influence  how  the  individual 
will  respond  to  the  ongoing  demands.  The  interaction  of  these  factors  will  determine  both  operator 
workload  and  operator  performance  and,  hence,  system  and  mission  performance.  Each  of  these 
components  is  considered  in  more  detail  below.  The  upper  portion  of  the  figure  contains  some  external 
influences.  The  system  design,  mission  and  other  external  factors  combine  to  create  situational  demands 
tor  the  operator.  In  the  middle  ot  the  figure  is  represented  the  operator  including  a  breakdown  of  some  of 
the  internal  factors  of  the  operator  which  have  a  bearing  on  OWL.  At  the  bottom  of  the  figure,  the  ovals 
represent  approaches  to  obtaining  responses  from  the  operator  which  are  used  to  make  inferences  about 
the  operator.  It  is  important  to  note  that  the  bottom  oval,  system  performance,  is  directly  related  to 
MANPRIMT  concerns.  More  will  be  said  about  these  measurements  in  Chapter  4. 

Situation  Demands  and  External  Influences 

Mission  Rsqulrsmsnts  and  Task  Allocation.  The  allocation  ot  system  functions  to  the  human  is  an 
initial  step  in  system  design  and  this  allocation  will,  in  turn,  lead  to  situation  demands  on  the  operator. 
During  system  design,  the  design  team  decides  which  functions  are  allocated  to  humans  and  which  are 
allocated  to  the  system.  Once  allocated,  those  functions  plus  the  design  of  the  controls  and  displays  will 
define  the  operator  tasks.  The  tasks  allocated  to  a  given  operator  represent  that  operator's  job.  The 
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Figure  2-2.  A  conceptual  framework  of  the  OWL  context  and  influences  on  operator/system  performance. 


human  factors  technique  of  task  analysis  is  concerned  with  understanding  how  these  tasks  will  impact  the 
probable  overall  performance  of  the  operator,  and  the  extent  to  which  some  of  these  tasks  might  not  be 
performed  at  acceptable  levels. 

Two  tasks  may  differ  in  a  variety  of  ways  which  can  affect  their  accomplishment.  The  two  tasks  may 
require  different  types  of  actions.  In  turn,  those  actions  may  require  more  effort  or  time  by  the  operator 
than  does  another  task.  Regardless  of  the  type  of  task,  the  operator  must  perform  some  sequence  of  acts 
on  some  objects  or  entities  in  order  for  the  task  to  be  accomplished.  In  some  tasks,  a  majority  of  the 
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required  actions  involve  manipulations  of  physical  objects.  Other  tasks  may  be  dominated  by  actbr.s 
requiring  the  operator  to  sense  or  perceive  the  attributes  and  characteristics  of  objects.  Still  other  tasks 
exist  in  which  a  majority  of  the  actions  involve  manipulations  of  internalized  definitions,  facts,  or  concepts. 

Similarly,  two  jobs  may  differ  in  the  Hindu  of  tasks  required  by  an  operator,  and  in  the  sequence  in  which 
the  tasks  must  be  performed.  Some  jobs  may  have  many  tasks  that  do  not  overlap  in  time.  Other  jobs  may 
have  multiple  ongoing  tasks  during  the  same  time  periods  and  require  the  operator  to  time-share  among 
those  tasks. 

Finally,  tne  system’s  machine  capabilities  (e.g.,  sensor,  data  processor,  and  propulsion  subsystems), 
the  relative  capabilities  of  hostile  forces,  and  the  availability  and  capabilities  of  cooperating,  friendly  forces 
will  change  from  mission  to  mission,  and  will  impact  the  speeds  and  accuracies  with  which  various  opeiator 
tasks  must  be  accomplished. 

In  summary,  tasks  can  influence  the  woikloaa  that  will  be  imposed  on  the  operator  by: 

•  Actions  required  by  each  task, 

•  Sequence  of  actions  performed  for  a  task, 

•  Number  and  types  of  tasks  to  be  performed, 

•  Time  available  for  each  task  to  be  completed, 

•  Overall  time  constraints,  and 

•  Required  accuracy  levels. 

Taken  togethor,  these  influences  constitute  a  comprehensive  s't  of  factors  that  contribute  to  the 
situation  demands  illustrated  in  Figure  2-2. 


The  Environment*!  Context.  The  tasks  performed  by  the  operator  are  not  done  in  isolation, 
however.  A  given  task  may  occur  in  widely  differing  circumstances  that  can  affect  the  level  of  dituculty  of 
that  task  for  the  operator.  The  way  in  which  the  operator  interacts  with  the  immediate  surroundings  will 
also  have  important  implications  for  performance  and  workload.  It  is  widely  recognized  by  engineers  that 
machine  components  cannot  tolerate  some  kinds  of  physical  disturbances.  They  must  be  protected 
(hardened)  to  function  in  the  presence  of  hostile  environments.  Detailed  attention  is  given  to  specifying 
how  machine  components  will  be  packaged,  supported,  and  interlaced  with  other  machine  components. 
Similar  attention  nnist  be  given  to  the  support  and  interfacing  of  human? ,  ooth  with  one  another  and  with 
machine  components.  Among  the  external  factors  which  alter  situational  demands  and  which  affect  levels 
of  task  difficulty  arc: 
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Tiie  external  environment  in  which  the  task  must  be  performed  (e  g.,  heat,  humidity, 
sound,  illumination,  vibration,  and  g-tooea) 


•  The  design  ol  the  human-machine  Information  exchange  units  (e  g  ,  types  and  sizes 
of  displays  and  controls,  and  their  layouts  and  formats) 

•  The  design  tor  human  packaging  (e.f).,  protective  clothing,  seating,  and  resttaints) 

•  The  design  of  the  overall  workstation  (e.g..  Its  size,  Internal  lighting,  ventilation, 
temperature  and  humidity  control,  and  vibration  dampening) 

To  a  large  extent,  external  environmental  factors  cannot  be  controlled  by  the  system  design  team;  these 
are  determined  by  the  missions.  However,  the  Immolate  external  environment  arid  the  extent  to  which  It 
impinges  on  the  operator  can  be  partially  controlled  by  other  design  factors.  Because  many  operator  tasks 
involve  the  exchange  ot  information  between  the  machine  and  human,  the  design  ot  the  operator 
console  will  affect  human  performance  on  the  tasks.  Goth  the  speed  and  accuracy  with  which  the  operator 
can  perform  a  given  task  and  the  erctent  to  which  the  operator  can  maintain  acceptable  performance  tor 
long  periods  of  time  will  be  partially  dependent  on  the  ambient  environment.  Thus,  operator  support  and 
workstation  design  factors  will  influence  the  workload  of  the  operator. 

The  Operator 


Every  operator  enters  into  a  situation  carrying  a  number  of  influences  which  can  impact  performance. 
These  are  divided  into  transitory  which  can  be  modttied  relatively  easily  and  stable  which  are  much  more 
difficult  to  modify 

Transitory  States.  Transitory  states  can  be  considered  to  be  initial  states  such  as  the  amount  of 
rest  level  of  physical  fitness,  etc.  which  may  or  may  not  be  appropriate  for  the  mission.  These  are 
depicted  in  the  center  right  potion  of  Figure  2-2.  Training  is,  of  course,  an  important  factor.  Indeed, 
training  is  sometimes  considered  to  be  the  single  most  important  factor  in  mission  success/failure  and 
often  a  panacea:  If  the  mission  fails,  provide  more  training.  Certainly,  training  and  specific  skill  acquisition 
are  important  and  extend  the  operator's  capabilty  to  handle  workload  (Bainbridge.  1978).  In  the  context 
of  Figure  2-1,  this  would  be  represented  by  Increasing  the  effective  area  of  Region  2.  Harris,  Tole, 
Stephens,  and  Ephraih  (1982)  have  expressed  similar  ideas.  However,  there  are  numerous  aspects  of 
high  workload  which  cannot  be  handled  by  additional  training,  for  example,  the  requirement  to  perceive 
faster.  Many  of  these  high  workload  factors  are)  related  to  the  cognitive  processes  of  the  operator. 

Stable  Traits,  in  addition  to  transitory  states,  the  human  operator  is  characterized  in  the  left  center 
portion  of  Figure  2-2  by  several  interrelated  facets  which  change  slowly  over  time:  goals/  motivational 
state,  knowledge/skills,  and  processing  capabilities.  Processing  capabilities  refer  to  the  operator's  higher- 
level  behavioral  components  (e.g.,  thinking)  which  interacts  with  and  integrates  knowledge  and  skills  to 
accomplish  task  element  goals. 
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Individuals  may  tfffer  in  th*  relative  importance  ot  vartoue  goals,  the  extent  to  which  those  goals  are 
currently  satisfied,  and  the  extent  to  which  performing  a  given  task  Is  perceived  as  being  Important  to  goal 
achievement.  They  may  also  differ  in  their  perceptions  of  the  speed  and  Accuracy  with  which  a  task  needs 
to  be  done.  These  factors,  in  turn,  determine  the  level  of  motivation  for  task  accomplishment  and, 
consequently,  the  effort  an  individual  is  willing  and  able  to  put  forth  in  uccomplshlng  the  task.  The 
motivational  aspect  of  the  workload  often  is  ignored  by  researchers.  Gopher  and  Donchin  (1986)  handle 
the  motivation  issue  by  ruing  it  out;  they  assume  that  every  operator  is  highly  motivated  and  wants  to 
maximize  his  or  her  performance. 

The  cognitive  processing  capacities  of  an  individual  are  distinguished  here  from  the  knowledge  and 
skills  an  individual  hae  acquired  through  training  and  experience.  Knowledge  (e.g.,  facts,  rules, 
equipment  usage  procedures)  can  be  considered  as  a  resource  of  the  individual  to  be  utilized  by 
cognitive  processes.  To  use  that  knowledge,  however,  the  Individual  must  invoke  other  dynamic 
processes  to  retrieve  and  manipulate  the  knowledge  required  to  execute  a  task.  Other  cognitive 
processing  capabilities  are  needed  to  glean  information  from  displays  and  to  manipulate  controls. 

fnffMrktfl  mwbrsncsie  among  Qpmtont 

Humans  are  known  to  dHfer  in  terms  of  individual  traits  or  capacities  that  can  impact  task  performance. 
Two  individuals  may  differ  from  each  other  in  a  variety  of  ways  which  may  make  aocompishment  of  the 
same  task  easier,  faster,  or  better  for  one  individual  than  for  the  other.  Physical  size  and  strength  are  two 
obvious  dimensions  along  which  dHferencr  e  may  be  observed. 

More  important  in  modem  technological  systems  are  the  mental  and  cognitive  differences  among 
individuate.  A  1st  of  the  important  cognitive  components  is  probably  longer  than  a  list  of  the  researchers 
studying  the  problem.  Some  ot  these  variables  include  information  processing,  perceptual  processing, 
decision  making,  numerical  operations,  and  spatial  processes  used  for  tasks  such  as  map  reading.  Some 
of  these  variables  are  represented  in  the  Armed  Services  Vocational  Aptitude  Battery  (ASVAB).  The 
aviation  community  has  led  the  way  in  using  such  teetit  in  selection.  However,  it  is  probably  fair  to  say  that 
Nttle  research  has  been  done  exploring  individual  differences  in  cognitive  skills  in  the  context  of  workload. 

Summary 


This,  then,  is  the  situation  we  need  to  study  and  it  is  complex.  Clearly,  workload  and  human 
performance  are  affected  by  external  Influences,  and  operator  states,  both  transitory  and  stable.  How  do 
we  measure  performance  success  or  failure?  Because  there  are  many  determinants  of  performance, 
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reeearchera  have  devised  many  ways  to  pradct  and  to  measure  their  influence  on  behavior.  Each  method 
may  provide  different  answers.  Thus,  the  way  the  question  is  phrased  and  the  approach  to  assessing 
behavior  becomes  important.  Consequently,  vrortdoad  is  generally  considered  to  be  complex  and  a  multi¬ 
dimensional  concept. 


Deflnldone  of  Operator  Workload 

A  parade  of  definitions  can  be  rather  duN.  However,  various  authors  have  discussed  the  meanings  and 
definitions  of  workload  in  afferent  manners.  Even  though  no  single  definition  of  workload  is  generally 
accepted,  it  is  well  to  organize  the  various  threads  of  thought  into  e  more  coherent  and  practical  package. 
Accordingly', « review  of  the  aftemative  definitions  will  be  instructive.  What  we  wiil  find  is  that  each  author 
has  a  different  twist  and  this  twist  is  reflected  in  associated  research  efforts.  The  differences  often  stem 
from  an  incomplete  understanding  of  underlying  mechanisms  and  processes.  So  it  is  in  workload; 
workload  is  not  a  unitary  concept  but,  in  fact,  a  multidimensional  one.  The  particular  definition  one  adopts 
has  extremely  important  implications  in  the  appScation  of  the  various  techniques  to  measuring  workload. 

Webster1*  defines  workload  in  the  following  ways: 

workload  n  1 :  amount  of  work  or  of  working  time  expected  from  or  assigned  to  an 
employee.  2:  the  total  amount  of  work  to  be  performed  by  a  department  or  other 
group  of  workers  in  a  period  of  time  (Webster's  Third  International  Dictionary,  1976,  p. 

2635). 

A  scientific  definition  becomes  much  more  detailed  than  just  amount  of  work  or  of  working  time.  Rather 
than  just  considering  the  individual,  one  can  consider  parts  of  the  individual.  Thus,  one  can  analyze  the 
amount  of  work  done  by  the  hands  or  by  the  eyes,  or  any  other  part  of  the  body.  A  common  distinction 
made  along  these  lines  is  between  physical  and  mental  workload.  Similarly,  the  definition  implies  some 
external  agency  defining  the  amount  of  work  and  the  number  of  things  to  be  done.  Bosses  are  good  at 
that.  However,  for  purposes  of  argument,  we  could  also  consider  workload  from  the  employee's 
viewpoint.  Comparing  the  two  viewpoints  may  show  a  discrepancy!  Indeed,  we  will  discuss  the  viewpoint 
of  some  investigators  who  state  the  latter  viewpoint  is  the  correct  viewpoint. 

Webster's  second  definition  refers  to  crew  workload  and  will  not  be  discussed  in  this  volume.  Individual 
operator  workload  relates  to  personnel  and  training  considerations;  crew  workload  relates  to  manpower 
considerations  as  well.  At  a  basic  level,  the  term  workload  carries  a  number  of  meanings  within  the  military 
community,  especially  the  second  dictionary  definition.  In  particular,  within  a  MANPRINT  context, 
workload  often  is  associated  with  the  number,  frequency  and  durations  of  activity-based  tasks  performed 
by  a  specific  number  of  Army  personnel  of  particular  Miltary  Occupational  Specialities  (MOS's),  skill  levels, 
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and  paygrades.  it  la  dear  in  this  context  that  workload  does  not  rater  to  cognitive/physical  underload  or 
overload,  but  rather  to  task-based  manning  considerations.  Obviously,  care  must  be  taken  to  specify 
clearly  what  Is  being  discussed  when  using  terms  like  workload  and  workload  analysis.  Crew  workload  and 
manpower  considerations  are  closely  tied  to  the  potential  cognitive  overload  of  Individual  operators  (Hill  & 
Buiqer.  1988),  but  they  are  different  and  should  be  clearty  differentiated. 

The  discussion  of  the  human  operator  model  suggests  that  an  operator's  performance  on  a  given  task 
depends  not  only  on  the  demands  of  the  task,  both  in  accuracy  and  time,  and  the  situation  in  which  it  is 
embedded,  but  also  on  the  capability  and  the  wilfingrtess  of  the  operator  to  respond  to  those  demands.  A 
difficulty  In  defining  operator  workload  Is  that  there  are  alternate,  legitimate  ways  in  which  workload  can  be 
considered.  We  will  not  consider  all  possible  definitions,  but  rather  Just  the  set  that  has  been  most  often 
used  by  the  researched:.  To  a  large  extent,  definitions  depond  on  the  techniques  used  and  the 
constraints  imposed  by  those  techniques.  In  this  section,  three  broad  categories  ot  workload  definitions 
are  discussed: 

•  amount  of  work  and  number  of  things  to  do, 

•  time  and  the  particular  aspect  of  time  one  is  oo  nee  mod  with,  and 

•  the  subjective  psychological  experiences  ot  the  human  operator. 

We  will  consider  each  of  these  categories  from  several  vantage  points.  The  first  two  are  congruent  with 
the  first  dictionary  definition  and  have  parallels  with  traditional  time  and  accuracy  performance 
measurement.  The  psychological  dimension  is  added.  Doing  so  reveals  gaps  in  research  which  obviously 
have  implications  for  application.  Although  it  is  somewhat  premature,  we  will  also  relate  the  definitions  to 
follow  to  the  workload  assessment  techniques  employed. 

Every  reader  is  familiar  with  the  fable  of  the  three  blind  men  examine  the  elephant.  Each  of  the  bund 
men  was  right  is  his  observation  but  wrong  in  his  conclusion.  Much  o  what  will  be  discussed  in  the  next 
section  is  a  living  example  of  this  fable.  But  sdence  is  like  that.  Wm  o  ftain  one  observation  at  a  time,  and 
through  a  collection  of  observations,  the  truth  begins  to  emerge,  .ater  we  will  describe  the  elephant 
called  workload.  First,  however,  let  us  review  some  observations. 

Amount  of  Work/ Number  ot  Things  To  Do 

To  quantify  operator  workload,  some  researchers  have  sought  to  identify  the  absolute  amount  of  work 
required  to  complete  a  given  task.  Although  this  Is  a  desirable  { oal,  It  must  be  recognized  that  the  actual 
amount  of  work  needed  to  complete  a  given  task  (e.g.,  assessing  a  tactical  situation!  varies  with  the 
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situation  (e.g.,  tha  number  of  targets  on  a  screen).  Nevertheless,  a  given  task  will  still  posttess  a 
distribution  of  amounts  of  work  required,  and  ft  might  be  useful  to  estimate  that  distribution.  This  approach 
to  quantifying  workload  considers  It  as  a  function  of  the  task  and  situation  -  a  point  of  view  external  to  the 
human  operator.  A  parallel  conception  is  the  number  of  things  which  have  to  be  done  in  a  psychomotor 
context  (Dick,  Brown,  &  Bailey,  1976).  Both  of  these  conceptions  are  performance  basod.  Note  th  at  this 
conception  Implies  an  accuracy  or  a  quality  component  of  human  performance.  The  quality  is  not  always 
well  defined;  sometimes  it  is  just  in  terms  of  satisfactory  completion;  'Any  landing  you  can  walk  away  from  is 
a  good  landing-' 

The  concept  ot  work  in  the  physical  sciences  Is  reacfily  understood.  It  Is  sometimes  less  clear  what  work 
means  for  biological  systems.  There  Is  a  large  overiap  In  the  concept  of  work  for  machines  and  humans, 
and  it  Is  instructive  to  describe  an  analogy  between  them.  First,  work  Is  not  performed  without  some  cost. 
Energy  or  other  resources  must  be  expended  for  work  to  be  accomplished,  for  example,  gasoline  is 
stored  in  a  vehicle's  tanks,  electricity  is  stored  in  batteries,  etc.  Second,  the  burning  of  fuel  and  oxygen 
results  in  energy  being  released.  Third,  tha  rate  at  which  fuel  is  burnt  may  change  from  moment  to 
moment  depending  on  the  current  demands  ot  tho  situation.  The  vehicle  could  also  run  out  of  fuel. 
Something  or  someone  must  detect  or  be  aware  of  the  changing  situational  demands  and  regulate  the 
rate  at  which  fuel  and  oxygen  is  being  delivered  to  the  engine. 

77x>  Stable  Capacity  of  an  Individual.  While  most  would  agree  that  the  amount  of  work  to  be  done 
is  an  important  element  of  task  workload,  the  amount  of  work  must  be  considered  in  relation  to  the  capacity 
of  the  individual  to  perform  that  wort;.  Here  again,  there  are  excellent  analogies  to  mechanical  workloads. 
For  example,  we  may  define  a  task  as  moving  a  wagon  having  a  particular  load  from  one  location  to 
another.  A  vehicle  having  a  large  capacity  motor  may  experience  no  cBfficulty  in  performing  that  task. 
However,  as  the  capacity  of  the  motors  of  alternative  vehicles  gets  smaller  and  smaller,  greater  and  greater 
difficulty  will  be  experienced  in  performing  that  task.  In  fad,  at  some  point,  the  load  might  be  too  much  for 
one  of  the  vehicles  to  handle.  In  tho  ear  a  fashion,  humans  differ  In  their  capacities  to  perform  a  given 
task.  Some  might  find  a  task  easy  to  do  wiii  e  clhers  might  find  that  same  task  impossible  to  peslorm.  This 
viewpoint  of  workload  represents  a  conception  o’  workload  internal  to  the  operator  rather  than  external  to 
him.  Furthermore,  the  capacity  of  the  humai  is  assumed  to  be  fairly  stable  across  time,  as  what  might  be 
found  by  administering  personnel  selection  tests. 

There  are  two  different  meanings  for  the  term  'capacity'.  One  involves  considerations  between 
individuals  (individual  differences)  and  the  v.ay  performance  and  workload!  differ  from  one  individual  i:o 
another.  Little  work  has  been  done  In  this  area.  The  other  meaning  refers  to  a  single  individual  and  is 
used  in  the  context:  How  much  more  can  the  operator  do?  This  latter  meaning  has  been  considered  in 
much  greater  detail  by  researchers  . 
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Spam  Gipactty  of  an  Individual  to  Perform  Othar  Taaka.  Much  discussion  of  workload  has  been 
based  on  the  foundation  of  Information  processing  concepts.  Gopher  and  Donchln  (1986)  suggest  that 
workload  implies  "limitations  on  the  capacity  of  an  information  processing  system"  (p.  41-3).  Gopher  and 
Donchln  (1988)  anri  Kantowltz  (1985;  1987a)  overview  some  of  the  more  prominent  theoretical  models 
related  to  the  workload  area.  In  these  overviews,  a  major  theoretical  perspective  of  workload  Is  the  spare 
capacity  model.  Under  this  formulation,  the  human  is  viewed  as  having  a  limited  capacity  or  ability  with 
which  to  process  information.  A  simplistic  example  would  be  a  person  who  has  the  capacity  to  receive  and 
process  a  specific  amount  of  information.  If  that  person  is  currently  using  only  25%  of  that  capacity,  then 
the  person  has  75%  spare  capacity  currently  not  in  use. 

Raaourcaa  Aval  lab  la.  A  related  model  also  based  in  Information  processing  is  referred  to  as  the 
multiple-resources  theory.  In  this  theory,  multiple  pools  related  to  specific  abilities,  such  as  verbal  and 
spatial,  are  postulated  to  exist.  Workload  is  then  considered  in  the  context  of  utilization  of  the  abilities, 
singly  and  In  combination.  Much  work  has  been  done  in  support  of  this  theory  (e.g.,  Navon  &  Gopher, 
1979;  Wickens.  1980;  1984)  that  suggests  there  will  be  less  competition  for  the  limited  resources,  and 
hence  less  overall  workload,  when  controls  and  displays  do  not  all  require  the  same  resource  pool  (e.g., 
veibal)  for  processing  and  controlling  than  if  the  display  and  associated  control  require  the  same 
resources.  (Our  second  example  at  the  beginning  of  this  chapter,  interleaving  recitation  of  the  alphabet 
and  numbers,  is  an  example  of  competition  for  memory  resources.)  From  this  perspective,  "mental 
workload  can  be  des  cribed  as  the  cost  of  performing  one  task  in  terms  of  a  reduction  in  the  capacity  to 
perform  additional  tasks,  given  that  the  two  tasks  overlap  in  their  resource  demands"  (Kramer,  Sirevaag,  & 
Braune,  1987,  p.  146).  However,  this  theoretical  perspective  l  as  its  skeptics  who  suggest  that  single 
pool  rapacity  is  sufficient;  multiple  pools  of  capacity  are  simply  ur  necessary  to  .  xplfi 1 1  human  information 
processing  (e  g.,  Navon.  1984;  Kantowltz,  1987a). 

Based  Conceptions  -  Working  71 ne 

The  preceding  section  discussed  several  ways  in  which  researchers  have  described  workload  in  terms 
of  amount.  In  this  section,  we  consider  the  issue  of  time.  Three  different  ways  cf  considering  operator 
workload  are  described,  all  based  on  temporal  elements.  Each  defines  workload  in  relation  to  some  time 
component,  as  in  the  amount  of  something  that  has  occurred,  is  occurring,  or  is  scheduled  to  occur. 
Simply  defining  workload  as  amount  of  working  time  fails  to  infonn  i  s  of  whether  we  should  attend  to  (a) 
the  past,  work  completed,  (b)  the  present,  work  currently  beinci  accompBshed,  or  (s)  the  future,  work 
scheduled  and  work  anticipated. 

The  future  is  the  easiest  to  deal  with.  To  date,  there  have  teen  few  published  discussions  of  work 
scheduled  as  a  factor  determining  workload.  Nevertheless,  th«)  current  activity  of  an  individual  will  be 
influenced  by  what  has  to  be  accomplished  h  er.  As  Hart  (pe  rsonal  communrcation,  Ju>  1987)  has 
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pointed  out,  tho  amount  of  time  spent  on  a  current  task  Is  influenced  by  the  known  and  expected  time 
requirements  of  I  Jture  tasks.  Sheridan  and  Simpson  (1979)  call  this  nearness  to  deadlines. 

One  of  the  most  common, *y  used  conceptualizations  of  time  involves  the  present.  It  includes  the  time 
required  (Tr)  f or  a  task  in  relation  to  the  time  available  (Ta)  to  perform  the  task:  Tr/Ta  (e.g.,  Holley  &  Parks, 
1987).  A  ratio  of  greater  than  1  implies  that  the  task  cannot  be  done  in  the  time  allotted;  a  ratio  less  than  1 
indicates  acceptable  times.  This  is  a  performance  definition  of  workload;  the  task  can  be  done  within  the 
time  frame  or  It  cannot.  The  Tr/Ta  ratio  defines  an  Important  but  limited  condition  for  overall  workload 
definition.  Normally,  the  application  of  this  definition  assumes  an  acceptable  quality  of  performance  when 
the  task  is  completed,  but  the  definition  does  not  take  into  account  the  degree  of  quality  of  performance. 
Like  the  amount  definition,  inferences  about  workload  are  made  from  performance.  If  the  task  can  be 
accomplished  within  the  time  available,  then  the  operator  may  have  spare  time  and  spare  capacity.  The 
Tr/Ta  ratio  Is  also  called  time  stress  by  some  authors. 

A  quite  different  approach  is  to  consider  the  time  already  expended.  Although  you  will  not  read  much 
about  it  in  this  volume,  this  is  related  to  the  effects  of  fatigue  and  the  Issue  of  workload  duration.  That  is,  a 
greater  effort  may  b.'  needed  to  perform  an  act  if  the  person's  current  capacity  tor  that  action  has  been 
depleted  or  is  currently  low.  Mental  effort  may  not  require  great  amounts  of  physical  energy  and  the  laws 
may  differ  for  mental  and  physical  fatigue.  Nevertheless,  probably  everyone  has  had  the  experience  ot 
being  pushed  to  the  point  that  it  is  relatively  difficult  to  think,  leading  to  slower  processing.  Indeed, 
performance  on  a  variety  of  cognitive  tasks  declined  in  a  sustained  command  and  control  environment 
(Angus  &  Helsgrave,  1983).  Mean  time  to  process  messages  increased,  showing  the  operators  were 
working  more  slowly.  Similarly,  the  number  of  correct  responses  decreased  on  a  logical  reasoning  task 
and  other  tasks.  However,  errors  did  not  necessarily  increase  on  these  tasks,  Indicating  slower  but  equally 
accurate  performance. 

Composite  Conceptualizations 


Having  understood  the  limitations  with  the  definitional  approaches  described  above,  several 
researchers  have  suggested  that  workload  is  realty  a  composite  of  several  different  things.  For  example, 
Jahns  (1973)  proposed  that  workload  can  be  thought  of  as  containing  the  components  of 

•  input  load, 

•  operator  effort,  and 

•  performance. 
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According  to  Jahrts,  input  load  is  the  task  requirements  (situation  demands  in  Figure  2-2 )  Imposed  on  the 
operator,  that  is,  what  is  required  of  the  operator.  The  second  component  is  the  degree  of  effort  being 
expanded  by  the  operator  to  accomplish  the  requirements.  The  third  component  relates  to  operator 
performance  and  to  what  degree  the  required  tasks  have  been  accomplished. 

Many  other  investigators  also  consider  workload  to  be  a  multidimensional  concept  Workload  has  been 
expressed  as  a  global  concept  that  affects  operators  In  relation  to  their  ability  to  accomplish  a  task  (e.g.. 
Hart,  1986b).  Edholm  and  Weiner  (as  cited  in  Rohmert,  1987)  suggest  that  workload  is  the  total  of  all 
determinable  influences  on  the  working  person.  Therefore,  all  elements  of  work  including  environmental; 
social,  motivational  and  other  factors  will  affect  the  workload.  There  can  be  Bttle  doubt  that  there  are 
individual  preferences  regarding  what  workload  means  a  nd  the  factors  that  may  cause  it.  Certainly  this  was 
the  case  when,  for  example,  Hart,  Childress  and  Hauser  (1982)  asked  117  people  which  of  19  posslbl; 
components  were  a  primary  component  of,  were  related  to.  or  were  unrelated  to  workload.  Each  of  the  19 
components  were  considered  as  primary  by  at  least  25%  it  the  Individuals.  However,  only  task  difficulty 
and  time  pressure  were  considered  a  primary  component  by  more  than  72%  of  the  raters. 

Subjective  va.  Objective 


Amount  of  work  and  time  to  do  the  work  are  two  objective  ways  of  Inferring  workload.  Somehow, 
however,  they  do  not  capture  all  there  is  to  workload.  In  the  driving  example,  performance  remained 
acceptable  throughout,  but  the  perceived  difficulty  of  the  task  increased  in  both  time  pressure  and  the 
amount  of  work.  One  would  like  to  capture  the  level  of  perceived  difficulty  as  an  indicator  of  when  the  task 
wifi  become  too  difficult.  Workload  researchers  have  recognized  this  omission  and  defined  workload  in 
the  context  of  sub, active  and  psychological  variables. 

Effort  Needed  to  Perform  a  Task.  Closely  associated  with  the  performance  of  a  task  is  the  effort 
needed  to  do  a  task.  From  this  standpoint,  workload  depends  net  only  on  the  particular  task  to  be 
accomplished,  but  also  the  current  capacity  of  the  operator  to  perform  the  task.  That  is,  a  greater  effort  will 
be  needed  to  perform  an  act,  not  only  if  the  person's  capability  to  perform  that  task  is  inherently  limited, 
but  also  when  the  resources  needed  to  perform  the  task  have  been  partially  depleted.  For  example,  one 
might  measure  the  actual  physical  work  being  done  by  a  person  doing  pushups  by  determining  the  actual 
distances  and  weight  bs<ng  lifted.  Some  persons  who  are  in  better  physical  condition  will  have  little 
difficulty  in  doing  a  certain  number  of  pushups.  For  others,  the  same  task  can  only  be  done  with  great 
difficulty.  However,  because  of  the  progressive  depletion  of  resources  during  this  task,  the  final  pushup 
may  be  perceived  as  having  required  considerable  more  effort  than  the  first. 

The  concept  of  workload  as  effort  also  considers  workload  to  be  something  internal  to  the  operator. 
This  makes  the  definition  dependent  not  only  on  the  normal  capabilities  of  the  individual,  but  also  on  the 
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currant  states  of  the  operator.  Thus,  ft  workload  Is  defined  as  operator  effort  should  one  be  measuring 
efforts  expended,  efforts  anticipated,  or  effort  currently  being  put  forth?  All  three  ways  are  implicit  In 
formal  models  of  the  human  operator,  but  have  not  always  been  Included  In  definitions  of  workload. 

Subjective  Experience?  When  Performing  a  Task.  Some  researchers  have  viewed  workload  as 
subjective  experience.  Johansson,  Moray,  Pew.  Rasmussen,  Sanders  and  Wlekens  (1979)  concluded 
that,  “If  the  person  feels  loaded  and  effortful,  he  Is  loaded  and  effortful  whatever  the  behavioral  and 
performance  measures  show"  (p.  105).  Similarly,  Sheridan  (1980)  suggested  that  "mental  workload 
should  be  defined  as  a  person's  private  subjective  experience  of  his  or  her  own  cognitive  effort,  (p.  1)." 

Sheridan  and  Simpson  (1979)  have  suggested  that  there  are  three  categories  of  words  that  are  usod 
when  talking  about  workload.  There  are  words  associated  with  task  time  constraints,  such  as  the  time 
available  to  complete  work,  the  number  of  Interruptions  and  the  nearness  of  deadlines.  There  are  also 
those  words  that  are  related  to  the  uncertainty  and  complexity  associated  with  a  task.  These  include  such 
things  as  uncertainty  as  to  what  the  tasks  are  and  what  the  consequences  of  various  tasks  will  be,  as  well 
as  the  type  and  amount  of  planning  that  must  be  done  to  accomplish  tho  task.  The  third  kind  of  words  are 
those  related  to  psychological  stress  such  as  risk,  frustration,  confusion,  and  anxiety. 

This  three-dimensional  definition  based  on  time  constraints,  task  complexity,  and  psychological  stress 
was  adapted  and  operationalized  by  Reid,  Shingledecker  and  Eggemeier  (1981)  for  use  in  their 
Subjective  Workload  Assessment  Technique  (SWAT).  Time  load  retire  to  the  relative  amount  of  time 
available  to  the  operator  (AAMRL,  1987)  and  the  percentage  of  tmo  an  operator  is  busy  (Eggemeier, 
McGhee  &  Reid,  1983),  and  includes  elements  such  as  overlap  of  tasks  and  task  interruption.  Mental 
effort  (task  complexity)  refers  to  tho  amount  of  attention  or  concentration  directed  toward  the  task, 
independent  of  time  considerations.  Psychological  stress  is  the  degree  to  which  confusion,  frustration, 
and/or  anxiety  is  present  and  adds  to  the  subjective  workload  of  the  operator.  Factors  that  may  increase 
stress  and  elevate  distraction  from  the  task  include  personal  factors  such  as  motivation,  fear  or  fatigue,  and 
environmental  factors  such  as  temperature,  noise,  or  vibration  (AAMRL,  1987). 

Summary  Comments 

Stating  that  operator  workload  is  a  multidimensional  concept  may  appear  reasonable,  at  first  glance,  but 
it  tends  to  beg  the  question  of  what  workload  realty  is.  Workload  Is  often  used  as  a  practical,  atheoretical 
term.  Sometimes,  workload  is  defined  In  terms  of  the  amount  and  number  of  tasks  to  do  and  the  time 
available  to  do  them.  Instead  of  attempting  to  define  the  concept,  these  approaches  tend  to  imply  how 
workload  should  be  measured  and  assessed.  In  many  cases,  there  is  little  in  the  definition  to  distinguish 
between  workload  and  performance.  Some  definitions  are  more  internal  and  include  psychological 
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dimensions  such  as  stress,  effort,  and  difficulty.  However,  one  can  appreciate  the  complexity  of  the 
operator  workload  concept  by  noting  all  the  facets  that  have  been  ascribed  to  it  from  the  various 
definitions  or  conceptualizations. 

What  Wit  Attain  by  Workload:  An  Analogy 

Earlier  in  this  chapter,  it  was  pointed  out  that  workload  is  not  the  same  as  performance  although 
workload  is  related  to  performance  and  assumed  to  be  a  determiner  of  the  quality  of  performance.  It  was 
also  pointed  out  that  there  are  a  variety  of  influences  on  performance  In  the  context  of  a  human  model: 
Performance  is  the  coin  of  the  realm.  The  various  definitions  of  workload  hint  at  what  is  deemed  to  be 
important,  specifically,  number  of  things  to  do,  the  time  to  do  them  In,  and  psychological  factors.  These 
points  all  describe  performance  and  workload  from  a  relatively  static  standpoint. 

However,  an  operator  is  highly  adaptable  and  dynamic.  By  putting  forth  more  effort  for  short  periods  of 
time,  adequate  performance  can  often  be  maintained  even  on  tasks  tha  are  too  difficult  or  too  complex  to 
handle  for  extended  periods  of  time.  But  high  workload  conditions  take  their  toll,  they  deplete  resources 
needed  for  various  capabilities,  and  they  may  well  result  in  inadequate  performance  in  the  future.  An 
analogy  would  be  if  a  design  engineer  evaluated  the  performance  ot  a  ncrw  vehicle  only  by  the  distance  it 
was  capable  of  traveling  without  ever  considering  the  size  of  the  fuel  tank  or  the  rate  at  which  fuel  was 
being  used.  The  fact  that  the  vehicle  can  reach  long  distances  under  some  conditions  does  not  mean 
that  it  can  always  reach  those  distances.  To  take  the  analogy  further,  it  is  also  true  that  the  relationship 
between  engine  load  in  revolutions  par  minute  (RPM)  and  fuel  consumption  is  non-linear.  Requiring  the 
vehicle  to  travel  a  specified  distance  at  a  very  low  or  high  RPM  will  use  more  resources  per  unit  distance 
than  if  the  same  distance  were  traveled  at  an  optimaRy  efficient  RPM. 

These  characteristics  are  depicted  in  schematic  form  in  Figure  2-3.  The  ordinates  show  the  load  on  the 
engine  in  RPM  and  distance  or  vehicle  range  as  performance.  In  addition,  the  capacity  of  the  fuei  tank  or 
amount  of  fuel  available  is  represented  as  a  parameter  with  several  different  capacities  shown  as  curved 
lines.  To  determine  the  distance  that  can  be  traveled  (performance),  one  needs  to  know  RPM  and  the 
size  of  the  tank.  There  are  boundaries.  RPM  cannot  exceed  some  practical  maximum,  i.e.,  the  red  iine  if 
engine  damage  is  to  be  avoided,  and  obviously,  it  RPM  is  zero,  no  distance  will  be  traveled.  Similarly, 
there  are  limitations  of  the  capacity  of  the  fuel  tank;  it  cannot  be  zero  and  there  is  a  practical  maximum.  If 
one  wanted  information  about  a  12.5  gallon  tank,  one  would  interpolate  between  10  and  15. 
Performance  of  the  vehicle  under  varying  conditions  can  thus  be  described  in  non-Jiuear  terms  with 
respect  to  RPM  and  in  terms  of  a  performance  envelope  whicf?  is  represented  as  the  white  space  in  the 
figure. 
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Figure  2-3.  A  schematic  representation  of  a  vehicle  performance  enveiope  as  a  function  of  workload/RPM 
with  fuel  tank  capacity  as  a  parameter. 


There  are  several  other  points  one  can  make  in  this  context.  Dynamic  changes  made  in  the  course  of 
execution  can  be  represented  in  the  figure.  At  or  above  the  optimal  RPM,  an  increase  in  RPM  will  result  in 
a  reduction  of  travel  range  which  would  be  represented  as  a  shift  of  relative  position  within  tho  envelope; 
for  example,  an  increase  in  RPM  from  point  A  in  the  figure  to  A'  results  in  a  lower  vehicle  range.  Similarly, 
one  would  want  to  plan  a  safety  margin.  For  example,  in  aviation,  the  pilot  is  responsible  for  calculating  the 
amount  of  fuel  needed  to  reach  the  destination,  plus  the  amount  needed  to  reach  an  alternate  airport, 
plus  a  further  safety  margin  of  at  least  10%.  In  aviation,  it  is  standard  practice  to  stay  away  from  the 
performance  enveiope  boundaries. 

In  a  similar  way,  we  can  consider  humen  performance  in  terms  of  a  performance  anvelope.  We  show 
this  in  Figure  2-4  which  is  basically  a  human  analogy  to  Figure  2-3.  In  this  case  we  have  depicted  workload 
(time  or  amount)  and  performance  on  the  ord’nates.  The  parameter  can  be  viewed  as  an  estimate  of  the 
operator's  current  states,  in  sboit,  his  current  capability.  There  is  a  parallei  between  a  dynamic  change  !n 
the  vehicle  analogy  and  a  dynamic  change  in  human  work.  Both  performance  functions  are  non-linear; 
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unlike  the  vehicle  example,  however,  the  amount  of  currently  available  human  capacity  can  vary  with 
changes  in  effort  expended,  at  least  up  to  a  limit.  Aa  In  the  vehicle  example,  one  would  want  a  safety 
margin  and  that  is  attained  by  avoiding  the  performance  envelope  boundaries.  But.  In  order  to  avoid  the 
boundaries,  one  needs  to  know  where  in  the  space  the  operator  currently  is,  hew  much  additional  work  is 
coming  in,  and  the  rate  at  which  this  additional  work  wilt  cause  the  operator  to  move  toward  a  boundary. 
Thus,  workload  cannot  be  evaluated  merely  by  knowing  the  amount  of  work  that  a  task  requires  of  the 
human.  One  also  needs  to  know  the  rate  ai  which  the  work  must  be  done  and  the  extent  to  which  it  will 
deplete  the  human  resources  that  are  available,  not  only  for  the  current  task,  but  for  others  that  will  be 
occurring  in  the  future.  In  short,  one  needs  to  know  where  the  operator  Is  In  the  performance  envelope  at 
any  given  time. 
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Figure  2-4.  A  schematic  representation  of  human  performance  and  the  workload  envelope. 

Nor  is  it  sufficient  merely  to  consider  the  impact  of  various  tasks  on  the  average  operator.  Individuals 
differ  in  their  capabilities  and  resources  at  the  beginning  of  a  mission,  and  those  differences  may  become 
more  pronounced  as  the  mission  unfolds.  Operators  who  start  a  mission  with  lessor  capabilities  may  have 
to  expend  their  limited  resources  .aster  than  those  who  betian  with  greater  capabilities.  Task  demands 
and  the  likelihood  of  humans  being  able  to  accomplish  them  cannot  be  analyzed  and  evaluated  without 
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considering  both  the  IndMdual  and  the  impact  that  previous  tasks  may  have  had  on  the  IndMdual.  Just  as 
we  are  interested  In  the  readiness  of  a  rniRtary  unit  to  respond  to  various  types  of  demands  that  may  be  put 
on  it,  the  practitioner  should  be  interested  in  the  moment-to-moment  readiness  of  the  individuals  to 
respond  to  various  task  demands.  That  is,  we  need  to  know  the  starting  position  (the  capacities  of  the 
mental  and  physical  fuel  tanks)  of  the  operator  in  the  workload  space  and  how  close  the  operator  is  to  the 
boundaries  of  the  envelope. 

What  We  Mean  by  Workload:  Describing  the  Elephant 

The  review  of  the  definitions  (fid  not  IncScate  any  overwhelming  unanimity  among  the  authors.  Indeed, 
the  definitions  reviewed  have  more  in  common  with  assessment  technique  descriptions  than  with 
conceptual  definitions  per  se.  Having  made  these  few  points,  let  us  be  venturesome  and  extract  some 
conceptual  principles  concerning  workload.  Our  tenets  of  workload  are: 

•  Workload  is  relative.  It  depends  on  both  the  external  demands  and  the  internal 
capabilities  of  the  individual.  This  relativity  exists  In  both  dimensions  of  amount  and 
time,  e.g.,  it  can  vary  over  time  for  an  individual. 

•  Workload  causes  the  individual  to  react  in  various  ways.  Workload  is  not  the  same  as 
the  individual's  performance  in  the  face  of  work  or  tasks. 

•  Workload  involves  the  depletion  of  internal  resources  to  accomplish  the  work.  The 
higher  the  workload,  the  faster  resources  are  depleted. 

•  There  are  a  diversity  of  task  demands  and  a  corresponding  diversity  of  internal 
capabilities  and  capacities  to  handle  these  demands.  Persons  differ  in  the  amount  of 
these  capabilities  that  they  possess. 

Out  of  these  tenets  we  can  derive  a  working  definition  of  workload.  It  is  not  the  intention  here  to 
propose  the  definitive  meaning,  but  rather  to  suggest  the  working  definition  for  the  purposes  of 
understanding  and  of  practical  application.  In  the  sense  that  workload  and  performance  are  related  in  the 
manner  shown  in  Figure  2-1 ,  what  is  really  of  interest  is  to  predict  that  point  just  short  of  rapid  degradation 
of  performance.  This  can  also  be  stated  in  term.1;  of  the  current  vs  future  position  of  the  operator  in  the 
workload  envelope  of  our  analogy  in  Figure  2-4.  Fast  performance  can  be  measured,  but  the  future  ability 
of  the  operator  to  perform  is  what  the  practitioner  would  like  to  know.  Where  in  that  hypothetical 
performance  envelope  does  the  operator  currently  He?  In  this  sense,  the  aspect  of  workload  that  needs 
most  to  be  estimated  or  measured  is  considered  to  be  the  relative  capacity  to  respond.  This  working 
definition  is  meant  to  imply  not  only  the  amount  of  spare  capacity,  but  also  the  ability  of  the  operator  to  use 
that  capacity  in  the  context  of  the  specific  personal  arid  environmental  situation. 
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By  proposing  a  working  definition  as  the  relative  capacity  to  respond,  the  emphasis  is  on  predicting 
what  the  operator  will  be  able  to  accomplish  ir  the  future.  It  is  a  global  definition  in  that  it  does  not 
necessarily  attempt  to  explicate  the  specific  factors  or  dimensions  that  will  influence  individuals  in  their 
performance  or  perception  of  workload.  (The  definition  Is,  however,  consistent  with  all  the  points  made.) 
At  all  times,  workload  will  involve  the  interaction  of  the  operator  with  the  task  and  these  two  elements 
cannot  be  separated  totally.  At  the  same  time,  the  circumstance  will  dictate  to  what  extent  operator 
characteristics  or  task  characteristics  will  be  Important  in  the  assessment  of  workload.  The  specific 
situation  will  determine  the  most  appropriate  questions  to  ask  about  operator  workload,  and  consequently 
the  most  appropriate  ways  to  answer  those  questions. 

Taxonomies  of  Workload 


Our  working  definition  cuts  across  the  various  techniques  used  in  workload  assessment.  To  discuss 
the  techniques  we  need  a  different  framewonc,  and  tor  that  organizational  framework,  we  utilize  a 
taxonomy.  Taxonomies  are  developed  as  aids  in  classification.  Classification  serves  the  useful  purpose  of 
grouping  similar  items  together  as  well  as  being  helpful  in  explicating  their  structure.  Researchers  hava 
used  various  workload  taxonomies  for  the  two  main  purposes  ot  (a)  classifying  the  nature  of  the  operator 
tasks  and  (b)  classifying  workload  assessment  techniques. 

Task  taxonomies  are  useful  because  some  workload  techniques  appear  to  be  able  >o  discriminate  high 
and  low  levels  of  workload  in  some  typos  of  tasks  better  than  others.  Often  this  differential  discrimination 
results  from  the  specific  design  of  and  nhe  intention  behind  the  technique.  A  task  taxonomy  can  be  useful 
in  helping  to  determine  the  more  appropriate  workload  techniques  for  a  specific  application. 

Taxonomies  also  have  been  developed  to  classify  workload  methods  and  techniques  into  descriptive 
categories.  As  will  be  discussed  later,  some  categories  of  methods  are  more  useful  for  specific 
circumstances  than  others;  and  the  classification  scheme  provides  a  convenient  vehicle  for  categorization. 
By  classifying  both  tasks  and  techniques,  matches  may  be  found  more  easily. 

By  Taak.  A  comprehensive  review  of  the  operator  workload  literature  was  completed  nearly  a 
decade  ago  by  Wierwille  and  Williges  (1978)  In  the  repot,  they  provided  a  survey  and  analysis  of  400 
workload  studies.  For  classification,  they  used  a  human  operator  task  taxonomy  (cailed  Universal  Operator 
Behaviors)  that  had  been  developed  earlier  by  Berliner,  Angel!,  and  Shearer  (1964).  In  this  task 
taxonomy,  human  activities  in  systems  are  separated  into  four  broad  categories: 

•  Percepiual  tasks  or  sensing  tasks:  for  example,  seeing  a  warning  light  on  an 
instrument  panel; 
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•  Mediational  or  cogr  ive  tasks  are  those  that  involve  thinking  (e.g.,  solving 
rr^hematicdi  problems); 

°  Communication  includes  face-to-face  speaking,  radio,  and  other  communication 
•asks;  and 

•  Motor  processes  are  those  which  involve  muscles  or  body  movement  (e.g.,  activating 
a  pushbutton). 

The  Universal  Operator  Behaviors  taxonomy  has  been  adopted  by  other  workload  researchers  as  a  useful 
task  taxonomy.  (A  complete  description  of  the  taxonomy  appears  in  Chapter  8).  Some  of  these 
categories  will  be  discussed  indirectly  in  the  context  of  the  review  of  techniques. 

By  Technique.  Wierwilie  and  Williges  (1978)  separated  workload  techniques  and  measures  into 
tour  categories,  namely  subjective  opinion,  spare  mental  capacity,  primary  task,  and  physiological 
measures.  Having  developed  these  categories,  they  used  them  to  caMgortze  workload  techniques  with 
respect  to  operatoi  behaviors.  Other  taxonomies  of  workload  have  been  developed  as  well,  including 
subjective/objective  subjective/performance/physiological,  and  similar  variants.  For  example,  Johanssen 
(1979)  suggests  a  four-group  classification  for  techniques  to  measure  operator  effort;  tirne-Hne  analyses, 
information  processing  studies,  operator  activation-level  studies,  and  subjective  effort  ratings.  Moray 
(1979a)  suggests  that  OWL  techniques  be  divided  into  normative,  phys'olbgical,  and  empirical  measures 
corresponding  to  the  throe  components  of  the  structure  suggested  by  Jahns  (1973),  specifically,  input 
load,  operator  effort,  and  performance.  As  suggested  by  Moray  (1979a),  normative  measures  include 
those  which  look  at  the  input  load,  such  as  queueing  theory;  physiological  measures  include  those  that 
attempt  to  measure  the  effort  or  activation  level  involved,  such  as  heart  rate  or  EEG;  and,  empirical 
(behavioral)  measures  are  those  related  to  performance  such  as  reaction  time  or  root  mean  squared  (RMS] 
error. 

Other  researchers  suggest  classification  schemes  with  more  categories.  For  example,  Strasser 
(Hamilton,  Mulder,  Strasser,  &  Ursin,  1979)  has  developed  a  taxonomy  of  OWL  methodologies  with  eight 
categories: 


•  Vegetative  variables  -  heart  rate,  blood  pressure,  respiration,  galvanic  skin  response; 

•  Central  nervous  variables  -  electroencephalogram,  evoked  potentials; 

•  Biochemical  variables  -  hormone  levels  in  bodily  fluids; 

•  Peripheral  variables  -  pupil  diameter,  electrooculogram,  critical  flicker  fusion 
frequency; 

•  Subjective  methods  -  rating  scales; 

•  Loading  tasks  -  continuous  and  discrete;  paced  and  self-paced  tasks; 

•  Performance  measures  -  reaction  time,  etc.;  and 
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Observations  -  task  analysis  and  behavioral  measure*. 


The  categorization  of  technique*  in  this  taxonomy  reflects  a  particular  interest  in  physiological  measures  of 
workload  (Strasser,  1937).  Clearly,  classification  schemes  are  created  to  meet  specific  needs  of  the 
researcher  and  the  intended  user  and  appftc&tion. 

An  Exptg*tadTacfv*qu»  Taxonomy 


The  workload  technique  taxonomy  used  for  this  report  is  shown  in  Table  2-1.  It  Is  designed  to  be 
ftexKite  and  meaningful  in  addressing  OWL  issues  in  the  Army.  It  dfflers  from  previous  taxonomies  in  that 
greater  emphasis  is  placed  on  those  analytical  techniques  that  can  be  used  to  predict  OWL  during  system 
concept  development  and  prelminary  system  design.  Previous  taxonomies  and  most  OWL  research 
have  concentrated  on  empirical  techniques  that  are  applicable  only  at  more  advanced  stages  in  the 
system  development  cycle  -  •  i.e.,  they  are  test  and  evaluation  oriented  rather  than  design  oriented.  The 
term  analytical  is  used  to  label  techniques  which  are  used  In  a  predictive  manner  without  actually 
employing  an  operator;  operator-Jn-the-loop  techniques  are  labeled  empirical.  !i  is  quite  clear  that  the 
Army  needs  both  types  of  techniques.  The  taxonomy  is  elaborated  in  the  discussion  of  classes  of 
techniques  in  subsequent  chapters  and  presented  in  detail  to  include  el  techniques  in  Chapter  8. 

Analytical  Technique*  The  focus  of  the  analytical  techniques  is  on  workload  analysis  that  may  be 
applied  without  operatom-in-the-ioop.  These  techniques  are  used  to  predict  workload  early  In  sy  stem 
development  where  the  greatest  design  flexibility  it  available  with  the  least  Impact  on  system  cost.  These 
techniques  may  also  be  used  throughout  hardware  development  to  guide,  augment,  or  extrapolate 
beyond  operator-in-the-loop  investigations.  The  analytical  techniques  are  classified  into  five  categories; 
(a)  Comparison;  (b)  Expert  Opinion;  (c)  Mathematical  Models;  (d)  Task  Analysis  Methods;  and  (e) 
Simulation  Models.  These  analytical  categories  are  discussed  in  detail  in  Chapter  3. 

Empirical  Technique*.  The  empirical  techniques  have  received  considerable  attention  and  are 
fne  most  familiar  methods  (O'Donnell  &  Eggemeier,  1986).  The  taxonomy  of  empirical  techniques 
presented  here  includes  four  major  categories  (O'Donnell  and  Eggemeier,  1986)  and  is  similar  to  that 
developed  by  WlerwiMe  and  WIHges  (1978) .  These  include:  (a)  Primary  task  measurements  which  focus 
on  the  degree  to  which  human  and  system  performance  achieve  stated  goals,  (b)  Subjective  methods 
that  assess  operator  opinion  and  include  rating  scales  as  well  as  questionnaires  and  interviews  (c) 
Secondary  task  approaches  have  been  used  to  examine  the  amount  of  operator  spare  capacity,  (d) 
Physiological  techniques,  both  classical  (e.g.,  heart  rate)  and  spedalzed  (e.g.,  heart  rate  variability  or 
evoked  potentials)  which  continue  to  be  examined  as  to  their  most  appropriate  application  in  workload 
assessmec!.  These  classes  of  techniques  are  discussed  in  Chapters  4, 5, 6,  and  7,  respectively. 


29 


Table  2-1.  Taxonomy  of  workload  assessment  techniques. 


TECHNIQUE  CATEGORY  SUBCATEGORY 


Comparison 


Expert  Opinion 


Analytic 


Math  Models 

Task  Analysis 
Methods 

Simulation 

Models 


Manual  Control  Models 

Information  Theory 
Models 

Queueing  Theory 
__  Models 


Primary  Task 


Subjective 

Methods 


< 

-i 


System  Response 
Operator  Response 

Rating  Scales 
Questionnaire/Interview 


Empirical  — — j 


Secondary  Task 


Subsidiary  Task 
Probe  Task 
Dual  Task 


Physiological 


Classical 

Specialized 


Some  Additional  QafinJUonal  Issues  in  OWL 

There  are  some  additional  definitions  and  conceptual  tools  useful  for  workload  analysis.  Because  of 
ihsir  relevance  for  analysis  and  workload  assessment,  they  ate  addressed  in  this  section  The  issues  tend 
to  be  more  important  for  empirical  techniques  than  for  analytical  techniques;  however,  they  are  relevant  for 
both.  Analytical  techniques  use  definitions  to  identify  performance  and  workload  measurement.  The 
developer  and  user  can  decide  in  a  relatively  direct  manner  how  he  wants  to  assess  workload.  With 
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empirical  techniques,  however,  the  issue  is  not  quite  as  straightforward.  It  is  rot  easy  to  go  back  to  collect 
data  that  were  missed  on  the  first  test.  Nor  is  it  easy  to  scrap  a  technique  and  replace  it  with  another  that 
provides  a  more  desirable  level  of  quality  and  detail.  Clarification  of  these  concepts  at  this  point  will  help 
the  reader  in  evaluating  the  subsequent  review  and  discussion. 

Semithrtty  of  Techniques  end  Meenns 

Sensitivity  of  workload  assessment  techniques  Is  the  degree  to  which  the  various  techniques  can 
differentiate  between  levels  ot  bad  placed  upon  the  operator.  Some  investigators  (e.g..  Wierwitle  et  al., 
1985)  have  stressed  issues  of  workload  assessment  sensitivity.  It  is  generally  accepted,  mistakenly,  that 
most  empirical  workload  estimation  techniques  are  sensitive  to  changes  in  toad  imposed  on  or 
experienced  by  an  operator.  In  fact,  the  majority  of  techniques  are  insensitive  when  tested  in  scientific 
experiments.  For  example,  Wierwille  and  his  colleagues  tested  25  different  techniques  in  four 
experiments  and  found  that  only  about  25  to  30%  of  the  techniques  had  any  usable  sensitivity.  However, 
the  sensitivity  also  depends  on  the  appropriateness  of  the  technique  for  the  system. 

Lack  of  sensitivity  is  the  single  most  critical  issue  in  selection  of  an  empirical  technique.  If  an  insensitive 
technique  is  used,  it  will  indicate  there  are  no  changes  in  workload  regardless  of  the  values  of  the 
independent  variables.  This  could  lead  to  systems  with  workload  problems  discovered  only  after  fielding. 
It  is  for  this  reason  that  we  advocate  using  multiple  techniques  when  assessing  workload. 

BlagncsUdty 

Diagnosticity  refers  to  the  extent  to  which  a  technique  reyeats  not  only  overall  assessment  of  OWL  but 
also  information  about  component  factors  of  that  assessment.  For  example,  an  important  diagnostic  is  the 
ability  of  a  measure  to  differentiate  among  various  sensory,  perceptual,  cognitive  and  psychomotor 
aspects  of  human  performance.  The  concept  as  used  in  workload  has  been  attributed  to  resource  theory 
(O'Donnell  and  Eggemeier,  1985)  but  the  basic  methodology  for  such  a  differentiation  can  be  traced  back 
to  Gamer.  Hake  and  Eriksen  (1956).  The  essence  of  the  notion  of  diagnosticity  is  to  be  able  to  identify  the 
specific  mechanism  or  process  involved  or  overloaded  during  performance  of  a  particular  task.  Typically, 
the  diagnosis  is  an  inference  based  on  the  information  available.  Gamer  et  al.  (1956)  have  formalized  the 
concept  of  converging  operations,  a  diagnostic  methodology  for  attacking  the  problem  in  several  different 
ways  to  insure  the  quality  of  the  inference.  The  converging  operations  method  is  critical  to  the 
diagnosticity  of  workload  techniques. 
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Diagnostic^  is  an  important  issue,  but  most  workload  measures  are  inherently  weak  in  this  regard. 
Diagnosticity  can  often  be  Improved  significantly  by  simultaneously  recording  system  changes  induced  by 
the  operator,  i.e.,  recording  control  inputs  or  other  observable  behaviors.  This  is  a  version  of  converging 
operations.  An  example  is  provided  by  Harris  and  Christhllf  (1960)  In  which  oontrol  Inputs  of  the  operator 
were  recorded  simultaneously  with  eye  movements.  This  allowed  the  Investigators  to  relate  control  inputs 
to  dwell  times.  They  found  that  longer  fixation  time  or  dwell  time  on  an  instninwni  were  associated  with 
control  inputs  while  shorter  times  on  the  same  instrument  were  not  associated  with  control  Inputs.  Thus,  a 
long  dwell  time  without  a  control  input  would  imply  difficulty  in  interpreting  the  instrument  and  In  turn  would 
implicate  a  cognitive  mechanism.  By  Inference,  mors  mental  activity  art  therefore  more  decision 
processes  were  associated  with  the  longer  dwell  times.  Other  analyses  are  consistent  with  this 
suggestion  (Dick,  1980).  One  measure  by  itself  would  not  have  permitted  such  an  inference. 

Technique  vs.  Manure 


A  technique  is  a  generic  term  referring  to  a  workload  assessment  methodology.  A  measure  is  a  specific 
assessment  scale  or  a  metric.  For  example,  collecting  heart  data  with  either  a  wrist  band  for  pulse  or  chest 
electrodes  qualifies  as  a  technique.  Scoring  the  data  for  mean  heart  rate  qualifies  as  a  measure;  it  is  a  form 
of  a  metric  and  data  analysis  applied  to  heart  data,  (it  may  also  involve  considerably  different  assessment 
scales.)  Similarly,  evaluates  heart  rate  variabilty  is  another  measure  or  metric  appled  to  data  collected  with 
a  heart  technique. 

When  selecting  empirical  estimates  of  performanca  to  derive  workload,  an  investigator  must  not  only 
choose  appropriate  techniques,  but  also  appropriate  measures.  Within  a  technique,  sensitivity  may  vary 
with  the  measure  selected.  For  example,  if  time  estimation  has  been  selected  as  a  technique  to  be  used, 
there  are  many  measures  that  could  be  employed:  absolute  error,  standard  deviation  of  estimates,  root 
mean  squared  (RMS)  error,  or  number  of  no-response  intervals.  Technique  sensitivity  is  often  dependent 
upon  the  measure  used.  Wierwille  and  Connor  (1983)  and  Savage,  Wierwille,  and  Cordes  (1978) 
demonstrated  this  for  two  secondary  task  techniques  (i.e.,  time  estimation  and  digit  shadowing). 
Measures  should  be  selected  carefully  and  should  be  based  upon  previous  research  or  preliminary 
investigation. 

Technique  vs.  Procedure 

A  procedure  is  the  application  of  a  technique  specifying  the  steps  taken  in  applying  a  technique.  This 
is  like  plotting  out  two  different  routes  to  get  from  point  A  to  point  B.  Differences  will  be  in  terms  of  the 
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quality  of  the  ride,  the  time  taken,  the  likelhood  of  getting  lost,  etc.  Similarly,  a  given  technique  can  be 
applied  in  different  ways  and  each  variation  may  affect  performance  differently.  For  example,  time 
estimation  can  be  used  in  many  ways.  The  following  are  examples  ol  procedural  variations: 

*  Subject  produces  intervals  x  seconds  long,  or  x  seconds  after  an  auoitory  or  visual 
signal. 

*  Instructions  Indicate  whether  task  is  to  be  neglected  under  high  load  or  to  be 
performed  regardless  of  load. 

*  Interval  produced  is  to  be  5, 10, 15, 20,  or  25  seconds. 

*  Subject's  response  Is  verbal,  pushbutton,  or  pedal  actuation. 

*  Subject  is  instructed  to  count  or  subject  is  instructed  not  to  count. 

Because  performance,  as  well  as  sensitivity  and  dlagnosticity  of  the  technique,  is  affected  by  procedure 
as  well  as  by  technique  and  measure,  each  aspect  of  a  procedure  should  be  considered  and  decided 
upon  before  actual  data  collection.  Procedural  aspects  should  be  based  on  results  reported  in  the 
literature  and  application  specifics.  As  with  the  second  example  at  the  beginning  of  this  chapter,  a 
procedural  change  as  simple  as  altering  the  mode  of  response  from  verbal  to  written  responses  can  make 
a  big  difference  in  both  performance  and  the  subjective  experience  of  the  respondent. 
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CHAPTER  3.  ANALYTICAL  TECHNIQUES 


Cv^r  of  ArwHetce?  Techniques 

An  analytic.il  technique  produces  results  that  are  used  to  predict  performance  and  estimate  workload 
withoot -actually  havfnjfahuman  operator  exercise  the  system.  This  definition  of  analytical  technique 
applies  even  when  a  ootuntial  operator  of  the  system  under  development,  or  the  operator  of  a  similar 
system,  may  offer  expert  opinion  as  a  subject  matter  expert  (SME).  In  contrast  to  analytical  techniques, 
empirical  techniques  are  those  which  require  a  human  operator  to  Interact  with  the  system  In  question. 
The  Identification  and  development  of  useful  analytical  procedures  for  estimating  workload  and  predicting 
performance  continues  to  be  an  actively  pursued  goal.  This  is  especially  true  in  the  applied  sector,  where 
system  developers  need  to  assess  workload  early  in  the  design  process  while  conceptual  designs  are 
easily  modified. 

i 

The  general  difficulties  that  exist  with  the  assessment  of  OWL  (as  described  in  Chapter  2)  are  most 
pronounced  for  analytical  techniques.  The  lack  of  operator  interaction  with  the  system  presents  problems 
in  defining  the  relevant  workload  issues  and  measures.  There  is  also  the  added  difficulty  of  the  scarcity  of 
detailed  data  about  the  system  that  is  to  be  operated  by  the  human.  Typically,  analytical  techniques 
predict  performance  and  potential  performance  failures.  Workload,  therefore,  is  often  an  inference 
derived  from  a  prediction  that  a  task  cannot  be  performed  to  criteria  or  standards.  For  example,  an 
operator's  activities  may  require  more  time  than  Is  available  with  n  the  time  constraints  and  requirements  of 
the  mission. 

There  is  no  fully  accepted  formal  model  defining  the  fact*  rs  whi^h  drive  workload  nor  relating  the 
contribution  of  each  factor  to  overall  workload  and  its  subseqi  ent  impact  on  performance.  The  result  of 
this  deficiency  Is  that  various  analytical  techniques  use  differ  jnt  measures  to  assess  workload.  Some 
techniques  estimate  workload  without  explicitly  considering  thr  human,  Refining  workload  In  terms  of  task 
demands  such  as  numbers  of  tasks  to  be  performed.  Othe  s  try  to  estimate  the  Operator's  attentional 
reserve  capacity,  following  theoretical  constructed  human  armies.  Finally,  some  analytical  techniques 
attempt  to  incorporate  empirical  or  observed  human  per'ormance  capabilities  within  the  workload 
estimation  process. 

Few.  If  any,  of  the  available  analytical  approaches  may  bo  considered  to  capture  the  full  complexity  of 
the  workload  issue.  However,  the  techniques  cover  a  varievy  of  woVkloed  issues.  Thus,  each  individual 
method  can  provide  the  developer  some  useful  OWL  informrtior.  as  -well  as  information  about  the  operator 
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and  system  performance.  In  general,  two  conclusions  may  be  drawn  about  analytical  techniques  In 
particular,  and  OWL  techniques  In  general: 

•  A  battery  of  techniques,  both  analytical  and,  if  possible,  empirical,  Is  needed  for  each 
situation. 

•  Different  Situations  require  a  different  mix  of  OWL  assessment  techniques. 

A  number  of  very  useful  techniques  have  evolved.  Some  are  more  general  than  others,  and  some 
more  applicable  to  certain  problem  domains  than  to  others;  tire  difficulty  is  to  determine  which  techniques 
are  best  suited  for  a  specific  application.  The  Intent  of  this  chapter  Is  to  desert*  the  various  analytical 
procedures,  assess  the  utility  ot  each,  and  provide  specific  examples  of  each  procedure.  Table  3*1 
comprises  five  major  categories  of  workload  estimation  techniques,  each  of  which  is  described  in  detail  in 
subsequent  sections  of  this  chapter.  The  first  class  ot  techniques  involves  comparison  with  predecessor 
or  reference  systems.  The  second  technique,  expert  opinion,  Involves  the  elicitation  of  workload 
estimates  and  predictions  from  operators  or  other  system  experts.  Third,  mathematical  models  represent 
attempts  to  abstract  and  quantify  aspects  of  the  human-machine  system  through  the  use  of  formal 
mathematical  representations  and  relationships.  Fourth,  task  analysis  techniques,  based  on  detailed 
decompositions  of  the  intended  missions  into  individual  tasks,  are  described.  Lastly,  approaches  to 
computer  simulation  of  human  performance  are  considered. 


Table  3-1 .  Taxonomy  of  analytical  techniques. 


ANALYTICAL  TAXONOMY 

•  Comparison 

•  Expert  Opinion 

•  Mathematical  Models 

•  Task  Analysis  Methods 

•  Simulation  Models 


A  Summary  Evaluation  of  Analytical  Tochnkfuas 


In  addition  to  a  review  of  the  techniques,  there  is  an  intent  to  provide  guidance  on  which  procedures 
may  bo  best  suited  to  a  given  set  ot  resources  and  measurement  goals.  Toward  that  end,  Table  3-2 
provides  an  overview  of  the  techniques  and  a  consensual  judgment  of  the  present  authors  about  the  data 
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Table  3'?..  Comparative  overview  of  the  analytical  techniques. 


Technique 

Data 

Requirements 

Ccat/Effort* 

Requirements 

Diagnoottcfty 

Subjectivity 

Comparison 

Expert 

System  level 

Low  cost/ 

Low  effort 

Low 

High 

Opinion 

Task  level 

Low  effort 

Low  cost/ 

Low-Moderate 

High 

Math  Models 

Task  level 

High  effort 

Low  cost/ 

Low-Moderate 

Low 

|  Task  Analysis 

Time  Based 

Task  level 

Low  cost/ 
Moderate  effort 

Low-Moderate 

Low 

McCracken- 

Aldrich 

Task  level 

Low  cost/ 
Moderate  effort 

Low-Moderate 

Moderate 

Siegel-Wolf 

Task  level 

Moderate  cost/ 
High  effort 

Low 

Moderate 

SAINT 

Task  level 

Moderate  cost/ 
High  effort 

Low-Moderate 

Moderate 

Micro  SAiNT 

Task  level 

Low  cost/ 
Moderate  effort 

Low-Moderate 

Moderate 

SIMWAM 

Task  level 

Moderate  cost' 
Moderate  effort 

Low 

Moderate 

SWAS 

level 

Task  element 
Moderate  effort 

High  cost/ 

Low 

Moderate 

HOS 

Task  element 
level 

Low  cost/ 

High  Effort 

Moderate-High 

Low 

*  Cost  refers  to  acquisition  costs  in  dollars.  Effort  includes  number  of  personnel  and  development  1 
time/effort.  1 
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requirements,  costs,  diagnostlrity,  and  subjectivity  of  each  technique.The  column  entries  ate  defined  as 
follows.  The  term  data  requirements  refers  to  the  level  of  detail  required  to  use  the  technique.  These 
range  from  system  level  data  for  comparison  down  to  the  task  element  level  for  simulations.  Cost  refers  to 
the  acquisition  cost  of  the  technique  while  effort  refers  to  the  relative  number  of  human  hours  needed  to 
apply  the  technique.  Diagnostidty  gives  an  estimate  of  how  well  the  technique  will  pinpoint  causes  of 
workload  Subjectivity  refers  to  the  amount  of  Judgment  required  on  the  part  of  the  user  and/or  SMEs. 
The  potential  user  may  consult  this  table  as  a  guide  to  identify  techniques  of  particular  interest,  and  then 
pursue  additional  reading  for  more  information. 

Camp*  Icon  with  Existing  Syatama 

New  system  development  is  tradtionally  more  evolutionary  than  revolutionary.  Typically,  an  enemy's 
technological  developments  or  increased  level  of  threat  requires  upgrading  or  replacing  older  weapons 
systems  with  newer  versions  that  perform  essentially  the  same  functions.  In  this  case,  tr.e  older  system 
can  provide  an  abundance  of  lessons  learned,  if  that  information  can  be  obtained  in  a  useful  format.  The 
comparison  method  uses  the  physical  and  functional  similarities  between  existing  and  proposed  systems 
to  extrapolate  data  from  the  fielded  system  and  apply  them  to  the  conceptual  system.  There  is  little 
published  material  describing  the  application  of  comparison  to  workload  issues,  although  some 
techniques  have  been  developed  in  allied  areas.  However,  more  formal  technique^  for  this  comparison 
process  must  be  developed  if  its  full  potential  is  to  be  realized.  Relevant  work  which  has  been  reported  is 
briefly  summarized  below. 

Una  of  Comparison  tor  Predicting  Workload 

A  systematic  attempt  to  use  a  comparative  technique  for  predicting  OWL  is  that  of  Shaffer,  Shafer  and 
Kutcfc  (  ! 986),  They  developed  workload  estimates  for  a  single  crew  light  experimental  helicopter  (LHX) 
scout  mission.  They  based  their  estimates  on  an  earler  detailed,  time-based  workload  analysis  of  scout 
missions  conducted  in  a  OH-58D  helicopter  with  a  two-person  crew.  Had  a  good  workload  database 
already  existed  on  the  OH-58D,  their  comparison  might  have  been  performed  more  easily  and  effectively. 
Nevertheless,  their  effort  represents  one  of  the  first  attempts  to  systematically  compare  conceptual  and 
existing  systems  in  terms  of  OWL. 
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Um  of  Comparison  for  Prsriictfng  Systam  Etfacttwaa 


John,  Klein  and  Taylor  (1966)  have  developed  a  formalized  comparison  method  for  evaluating  a  system 
by  using  analogical  reasoning  based  upon  what  Is  known  about  a  comparable  system.  Their  method, 
known  as  Comparison-Based  Prediction  (CBP),  is  an  extension  of  Compared  liBty  Analysis  used  by  the  Air 
Force  to  estimate  system  reliability  and  logistic  requirements.  CBP  is  essentially  a  technique  for 
structuring  and  quantifying  SME  opinion  and  involves  identifying  factors  that  are  expected  to  influence 
relevant  system  characteristics  of  interest.  Comparison  cases  or  systems  are  then  selected  and  rated  as  to 
whether  they  possess  more  or  less  of  these  characteristics.  The  causes  of  these  judged  differences  are 
then  examined  ultimately,  to  identify  adjustment  factors  that  can  be  applied  to  the  comparison  system 
operational  data  to  produce  predictions  for  the  systom  under  study.  In  cases  where  applicable  operational 
data  do  not  exist,  they  can  be  generated  by  SME  estimates,  although  this  will  reduce  confidence  In  the 
results  obtained. 

CBP  feasibility  studies  were  conducted  to  develop  estimates  of  the  training  effectiveness  of  three 
training  devices:  automotive  maintenance  task  trainers,  tank  gunnery  simulators,  and  nowitzer  trainers 
(John  et  al.,  1986).  These  studies  indicated  that  CBP  was  a  viable  estimation  technique  that  was  useful  in 
generating  design  recommendations.  While  CBP  has  not  yet  been  applied  to  workload  explicitly,  the 
authors  state  that  it  could "...  enhance  a  preliminary  subjective  workload  assessment  model  by  providing 
reference  anchors  in  comparable  equipment,  existing  metrics,  and  operational  experience"  (p.  152). 

E&rty  CompembiBt)'  Analysis  In  Manpovm,  Panonnat,  and  Training  (MPT) 

The  Army  MANPRINT  initiative  encourages  the  use  of  predecessor  or  reference  systems  in  the 
analysis  of  anticipated  new  system  requirements  (U.  S.  Army,  1987).  To  that  end,  an  Early  Comparability 
Analysis  (EGA)  methodology  has  been  developed  (U.  S.  Army  Soldier  Support  Center,  1986)  to  identify 
MPT  requirements  early  in  the  material  acquisition  process.  A  baseline  comparison  system,  either  an 
actual  whole  system  or  a  composite  systom  made  up  of  applicable  components  of  other  systems,  is 
defined  and  used  to  establish  high  driver  tasks.  These  tasks  which  significantly  impact  MPT  concerns 
help  to  define  the  expected  number  and  types  of  peopte  or  the  required  amount  of  training.  The 
MANPRINT  initiative  may  be  expected  to  promote  the  use  cf  comparability  assessments  of  OWL. 
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Summery 


The  advantage  of  the  comparison  technique  for  predicting  OWL  is  its  ability  to  obtain  more  rigorous 
dala  than  purely  subjective  estimates.  Currently,  comparison  is  less  of  a  well  defined  technique  than  it  is  a 
generalized  procedure.  Although  one  might  Ike  to  see  empirical  workload  data  used  as  the  basis  for 
estimating  the  workload  on  the  conceptual  system,  such  a  data  base  could  also  be  obtained  from  validated 
analytical  techniques  such  as  task  analysis.  These  data  can  be  oxistlng,  or  collected  specifically  for 
comparison  purposes.  Unfortunately,  most  current  operational  systems  do  not  have  a  workload  database, 
and  for  those  systems  that  do,  the  data  often  have  questionable  reliability  and  validity. 

Thus,  the  comparison  technique  offers  a  fairly  straightforward  analysis,  but  only  if  data  are  available  on  a 
predecessor  system.  The  if  seems  to  loom  large.  While  it  is  likely  that  the  technique  is  often  used 
informally  (and  overlaps  the  expert  opinion  technique),  there  appears  to  be  a  lack  of  documented 
applications.  One  major  impediment  to  making  comparison  analysis  a  viable  technique  is  the  lack  of 
systematic  databases  on  existing  systems.  However,  as  operator-in-thelocp  workload  evaluations  of 
existing  systems  become  more  of  an  established  practice,  use  of  comparative  techniques  to  estimate 
new,  derivative  system  workload  should  be  facilitated.  For  example,  a  good,  solid  database  is  being  built 
for  helicopter  evaluations  (e.g.,  Szabo,  Bierbaum,  &  Hocutt,  1987)  which  will  make  OWL  comparison  much 
easier  for  helicopters.  If  similar  databases  are  constructed  for  other  types  of  systems,  the  comparison 
techniques  may  be  expected  to  have  growing  utility. 

Expert  Opinion 

Expert  opinion  is  the  oldest  and  most  extensively  employed  workload  prediction  technique.  This  is 
probably  due  to  several  factors  ind'  .r  ing  ease  of  implementation,  relatively  low  cost,  and  a  large  supply  of 
experts.  The  first  part  of  this  apprcach,  given  a  system  defined  to  some  preliminary  level  of  detail,  is  to 
identify  the  users  or  developers  of  systems  that  are  either  predecessors  or  functionally  similar  to  the 
system  under  study.  These  individuals  or  subject  matter  experts  are  then  given  a  description  of  the  new 
system  and  its  intended  use,  perhaps  within  the  context  of  a  detailed  operational  scenario.  The  next  step 
is  the  elicitation  of  the  subjective  opinions  of  the  SMEs  on  how  the  system  might  perform,  focusing  on 
major  strengths  and  weaknesses.  Analytical  evaluations  of  workload  may  be  developed  through  this 
approach  in  a  manner  similar  to  that  used  in  the  comparison  method.  Employed  as  described,  this 
technique  provides  a  capability  to  identify  broad  workload  problem  areas  early  in  the  design  process. 

The  application  of  the  expert  opinion  technique  described  above  is  usually  relatively  informal.  Often,  it 
is  of  considerable  benefit  for  the  workload  analyst  to  have  an  expert  describe  the  details  of  operation  in  an 
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unstructured  manner.  However,  this  informality  may  introduce  considerable  variability  into  the  quality  of 
the  results  obtained.  Whether  due  to  levels  of  experience,  familiarity  with  types  of  systems,  verbal 
capabilities  or  SME  bias,  individual  differences  in  SMEs  can  produce  a  substantial  spread  in  the  workload 
estimates.  There  also  may  be  miscommurtcatlon  between  the  investigator  describing  the  system  and  the 
SMEs,  resuiting  in  an  erroneous  understanding  of  how  the  system  operates.  Rschoff  (1983)  provides  a 
good  overview  of  the  problem  of  eliciting  expert  opinion.  For  this  technique  to  be  more  objective,  a 
structured,  formal  approach  is  needed  in  both  the  selection  of  SMEs  and  the  elicitation  of  Information. 

Delphi  Technique 

Attempts  to  structure  expert  opinion  have  been  made;  the  Delphi  method,  for  example,  has  been 
developed  for  reducing  the  variability  in  SMEs'  workload  estimates  (Dalkey,  1969).  This  technique  is  "...a 
process  whereby  subjective  judgements  or  the  implicit  decision-making  processes  of  experts  can  be 
made  more  objective  and  explicit"  (Meister,  1985,  p.  423).  Generally,  Delphi  is  administered  to  a  group  of 
SMEs.  The  eventual  goal  is  to  arrive  at  a  group  consensus,  for  example,  on  the  expected  workload  for  the 
defined  system  and  scenario.  The  Delphi  Technique  involves  several  phases,  most  of  which  are  iterations 
or  rounds  in  which  the  results  of  previous  rounds  are  summarized  and  returned  with  a  questionnaire  to  the 
group  of  SMEs.  The  method  is  most  applicable  to  situations  in  which  existing  referents  or  comparison 
systems  are  not  available,  or  where  extrapolation  or  prediction  are  required.  Tire  validity  and  reliability  of 
the  Delphi  method  is  subject  to  the  same  constraints  as  any  other  subjective  method,  but  where  such 
methods  are  required,  the  more  structured  Delphi  method  may  strengthen  the  results. 

Prospective  Subjective  Techniques 

The  most  significant  systematic  effort  in  expert  opinion  has  been  the  development  of  an  analytical, 
prospective  application  of  the  Subjective  Workload  Assessment  Technique  (SWAT),  dubbed  Pra-SWAT 
(Reid.  Shingiedecker,  &  Eggemeier,  1984).  Because  less  work  has  been  done  using  Pro-SWAT,  we 
defer  discussion  of  most  of  the  details  of  its  development  and  application  to  Chapter  5  which  describes 
SWAT.  Like  SWAT,  Pro-SWAT  has  a  scale  development  phase  and  an  event  scoring  phase.  The 
procedural  outline  of  a  Pro-SWAT  session,  as  described  by  Kuperman  and  Wilson  (1985),  involves  the 
following  steps: 

•  Define  workload  and  describe  SWAT  and  Pro-SWAT. 

•  Develop  the  measurement  scale. 
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•  Describe  the  mission  oquipment  package  including  controls  and  displays  (i.e.,  the 
switching  logic  and  formats). 

•  Provide  an  overview  of  the  mission  scenario  segments  that  comprise  the  role  playing 
exercise. 

•  Execute  role  playing  -  Run  through  the  see  i, ..  io  using  whatever  props  are  available. 

•  Obtain  Pro-SWAT  ratings  Obtain  ratings  ifter  completion  of  each  significant  task  or 
mission  segment. 

•  Conduct  a  structured  debriefing. 

Pro-SWAT  has  been  applied  to  a  variety  of  systems  Acton  anc  Crabtree  (1985)  used  it  to  evaluate  an 
improved  version  of  a  military  C3  system;  Detro  (1985)  Eggleston  (1984),  Eggleston  and  Quinn  (1984)  all 
describe  applications  to  advanced  aircraft  systems;  anc  Kuperman  (1965)  describes  the  use  of  Pro-SWAT 
in  evaluating  advanced  helicopter  crewstatfon  concepts.  Eggleston  (1984)  compared  Pro-SWAT  and 
SWAT  workload  ratings  provided  by  two  separate  grou|>f.  of  pitots.  One  group  participated  in  a  Pro-SWAT 
exercise  using  several  configurations  of  an  advanced  attack  aircraft,  the  other  group  flew  these 
configurations  under  the  same  scenarios  in  a  flight  simulator.  A  Pearson  correlation  coefficient  of  .85  was 
obtained  beiwoen  Pro-SWAT  and  SWAT,  indicating  i  high  degree  of  agreement  between  the  analytical 
and  empirical  techniques. 

Summary 

In  summary,  the  utility  of  tho  expert  opinion  techniques  for  OWL  prediction  is  high  during  initial  stages 
of  system  design.  Evidence  from  the  studies  reporter  above  suggests  their  use  in  situations  when  more 
objective  methods  are  not  applicable,  and  forma  iz  ng  expert  opinion,  as  represented  by  the  Delphi 
technique,  helps  the  SME  to  define  workload  more  objectively. 

Theoretically,  any  empirical  subjective  assessment  technique  such  as  SWAT  could  be  used  as  an 
analytical  technique  and  performed  prospectively.  Doing  so  would  provide  a  more  structured  process  for 
eliciting  expert  opinion.  However,  the  results  wculd  be  subject  to  the  same  caveats  as  the  parent 
empirical  technique,  as  well  as  considerations  based  or.  the  introspective  nature  of  SME  estimates. 

Mathematical  Models 


One  of  the  earliest  goals  of  rasearchers  in  workload-related  areas  was  to  develop  a  rigorous 
mathematical  model  which  would  be  useful  for  predicting  operator/system  performance.  !n  principle,  such 
a  model  would  identify  the  relevant  variables  and  combine  them  at  pnpriately  so  that  workload-associated 


effects  on  performance  could  be  accurately  and  relably  estimated  or  predicted  The  major  steps,  at  Irt  all 
attempts  to  model  human  performance,  were  to: 

•  Identify  variables  that  influence  workload  either  directly  or  indirectly. 

•  Determina  the  lawful  relationships  by  which  these  variables  combine. 

•  Establish  how  the  resultant  workload  predictions  drive  precisions  of  performance. 

To  date,  no  fully  comprehensive  mathematical  model  has  been  developed.  Several  investigators  have 
taken  existing  models  from  engineering  application  domains  and  extended  them  to  some  aspect(s)  of 
wo;klo ad-related  operator  performance.  The  most  prominent  of  these  models  are  based  on  manual 
control,  information  theory,  and  queuing  theory.  Each  model  Is  proposed  to  contain  some  parameter  or 
component  that  reflects  the  operator's  load  or  effort  under  specified  condWons.  Some  models  contain 
specific  parameters  that  are  proposed  to  be  an  index  of  load;  others  presume  loading  by  defining  the 
environmental  input  characteristics  that  are  assumed  to  affect  OWL  and  performance.  The  assumption  in 
both  cases  is  that  thess  models  will  predfct  workload-related  drivers  and  resulting  performance. 

Many  of  the  models  described  below  are  aimed  at  continuous  control  tasks  or  information  monitoring 
tasks  which  have  information  presented  on  separate  displays.  In  part,  this  is  because  these  tasks  have 
been  and  still  are  Important  In  complex  system  control.  More  importantly,  the  associated  performance 
characteristics  are  definable  and  thus  are  amenable  to  this  level  of  mathematical  modeling.  Today,  with 
greater  use  of  automated  fight  control  systems  and  multifunction  information  displays,  the  manual  control 
task  characteristics  are  becoming  relatively  less  important.  This  does  not  mean,  however,  that  operator 
workload  is  concommitantly  reduced  Indeed,  the  reverse  is  true.  The  implication  is  that  mathematical 
models  need  to  be  developed  that  reflect  the  current  set  of  Increasingly  cognitive  tasks. 

Manual  Control  Modal* 

The  manual  control  models  fall  into  two  gen  era!  categories,  those  based  on  classical  control  theory  and 
those  that  use  modem  state-space  estimation  methods  as  exemplfied  by  the  optimum  control  model. 
Both  were  developed  within  the  context  of  continuous  manual  control  tasks,  such  as  piloting  a  vehicle. 
Consequently,  their  application  to  workload  estimation  and  prediction  ie  generally  restricted  to 
environments  Involving  continuous  controlling  tal  ks  designers  attempt  to  model  the  human  operator 
engaged  in  such  a  task  so  the  combined  human-machine  system  performance  may  be  determined.  The 
resultant  model  reflects  the  effort  (workload)  the  operator  is  expendittg  in  order  to  maintain  control  ot  the 
system.  Extended  treatments  of  both  of  these  types  of  models  can  be  found  in  the  Iterator*  (e.g„  Kelley, 


1968;  Shertadr.  &  Ferrell,  1974;  Rouse,  1980).  For  an  excellent  treatment  of  behavioral  aspects  of  control 
theory  see  Pew  (1974). 

Manual  control  models  have  proven  extremely  valuable  In  aircraft  system  development  where  accurats 
prediction  of  handing  qualities  is  essential  to  development  of  fly  able  aircraft.  Although  these  models  may 
be  adapted  to  estimate  measures  associated  with  OWL  in  this  context,  the  mathematical  sophistication 
required  to  develop  or  even  understand  the  models  limits  their  applcabilty.  Detailed  system  parameters 
must  also  be  provided  to  exercise  these  models  fuly;  these  parameters  are  frequently  not  available  during 
earfy  concept  development.  Consequently,  manual  oontiol  models  arc  xri  viable  for  many  conceptual 
system  evaluations. 

laasefaW  Control  Theory.  Classical  control  theory  usee  dosed  loop  stability  analysis  methods  to 
generate  describing  functions  of  the  human  operator  engaged  in  a  continuous  control  task.  In  essence, 
tlte  human  is  considered  to  be  e  servomechanism  attempting  to  eliminate  pirceived  errors.  Error,  such  as 
deviation  from  potn.  is  the  input  to  the  model,  and  operator  response  via  dome  manipulator  device  is  the 
output.  These  models  provide  a  continuous  prediction  of  operator  output  over  time.  In  workload 
estimation  applications,  a  baseline  operator  describing  function  is  developed.  External  loading  factors  are 
then  appled  which  change  the  characteristics  of  the  model  in  a  manner  wiich  is  beloved  to  be  Indicative 
of  workload.  For  example,  system  response  lags  to  operator  control  Inpuis  can  be  varied.  Changes 
ascribed  to  increased  loading  may  be  used  to  predet  OWL  to  the  extent  that  the  ccndSons  under  which 
the  describing  function  was  developed  are  generalzable. 

An  application  of  classical  control  theory  to  the  workload  estimation  problem  Is  described  in  Ho  lister 
(1986).  A  model  is  developed  to  estimate  the  akocation  of  an  aircraft  pilot's  attention  among  continuous 
control  and  a  number  of  other  managerial  tasks.  The  model  provides  Ineight  into  the  nature  of  control  task 
degradation  due  to  divided  attention  through  changes  in  the  describing  functions.  It  also  provides  an 
indication  of  the  atterrtional  demands  required  for  control  activity  and  the  excess  capacity  left  for 
managerial  tasks.  The  stated  assumption  is  that  bad  handing  qua! ties  leave  HttSe  capacity  for  managerial 
tasks;  good  handing  qualities  leave  more  capacity.  System  design  goals  are  to  maximize  excess  control 
capacity.  For  example,  to  reduce  the  attention*]  demand  for  primary  flight  control,  displays  can  be 
redesigned  so  'hat  less  time  is  requited  for  gathering  fight  information.  Despite  the  abiliy  of  the  model  to 
predict  performance,  it  is  generaly  Imtted  to  continuous  control  workload.  However,  the  model  has  been 
able  to  predict  plot  ratings  of  aircraft  handing  quality. 

Optimal  Control  Model  Modem  control  theory  uses  sets  of  tfflerential  equations  containing  state 
variables  and  control  variables  to  describe  the  controksd  system.  This  state-space  estimation  theory  has 
produced  the  ojjtirial  control  model  (OCM).  An  optimal  controller,  when  given  a  process  to  control,  does 
so  by  (a)  observing  the  state  variables  to  the  degree  of  accuracy  possible,  and  (b)  generating  a  control 
response  to  these  variables  while  minimizing  a  performance  criterion  or  cost  function.  The  criteria  are 
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usually  defined  m  a  function  of  error,  control  effort,  or  time.  The  OCM  assumes  that  a  well  trained  human 
operator  will  behave  as  an  optimal  controller.  This  Imples  that  the  operator  wit  be  aware  of  Ns  own  and  tire 
system  dynamics.  That  is,  the  operator  has  knowledge  of  human  response  capabilty,  the  disturbances 
affecti  ng  the  system,  and  the  criterion  which  defines  optimal  control.  Variables  such  as  observation  noise 
and  motor  noise  are  used  to  introduce  error  (Baron,  1979)  and  can  be  related  to  atterrtlonal  scanning 
which  is  one  variable  considered  to  reflect  difficulty,  and  hence  workload.  OGMs  of  the  human  operator 
have  performed  reasonably  wel  in  matching  observed  behavior  and  are  capable  of  handing  complex 
multivariable  systems  (Baron,  1979).  Within  the  appropriate  context,  the  predictive  usldity  of  these 
models  makes  them  very  useful,  although  their  mathematical  complexity  makes  them  Inaccessible  to  most 
investigators. 

An  oxcelent  treatment  of  appIcaUons  of  OCM  to  wcridoad  estimation  may  be  found  In  Levison  (19/9). 
In  this  report,  Levtson  traces  the  development  of  the  model,  defines  the  basic  workload  model,  cites  a 
number  of  valdation  studies,  and  suggests  issues  for  further  development  of  the  model  .  Additional 
examples  of  the  model's  application  can  be  found  in  Rickard  and  Levison  (1981)  for  the  prediction  of  pilot 
ratings  of  the  handling  quality  ot  different  aircraft  ocmftguratiora,  arid  in  Wewerinko  (1974)  arid  Smit  and 
Wewaiioke  (1978).  These  appScatkms  of  OCM  predct  a  workload  index  based  on  control  effort  which  is 
developed  in  terms  of  OCM  parameters.  Lav! son  (1970)  defines  an  OCM  model  containing  an  attention 
parameter  which  influences  the  obeervatton  noise  within  the  state  variable  estimator.  This  parameter  c.m 
be  used  to  determine  the  attention  allocated  to  a  display  variable  and  hence  the  relative  importance  of  th  at 
dfcptay  variable  in  a  control  task.  The  OCM  model  can  atao  be  used  for  display  design  evaluation  (Baron  & 
Levison,  1977;  Gainer.  1879). 

A  recent  development  of  the  OCM  approach  is  the  Procedure-Oriented  Crew  (PROCRU)  Mod<  ! 
(Baron,  Zacharies  Muralidfcaran,  &  lancraft,  1980).  PROCRU  provides  a  framework  for  dealing  with  bot  i 
discrete  and  continuous  tasks.  In  a  discrete  task  appBcation,  Levison  and  Tanner  (1971)  replaced  thj 
control  law  with  a  Bayesian  formulation  and  were  able  to  simulate  human  performance  for  detection  ot  a 
signal  In  noise.  Thu  OCM  has  considerable  breadth  and  most  of  the  studies  have  corresponding 
validation  data.  OCM  is  clearly  a  performance  model  with  parameters  which  represent  workload 
manipulations.  These  manipulations  are  ot  the  form  of  amplitude,  frequency,  or  phase  lags  in  the 
equations.  As  a  resuit,  workload  definitions  are  as  varied  as  the  manipulations  employed. 

hbwMtfan  TUaxyMocMi 

Information  theory  ns  appied  to  models  of  human  activity  achieved  its  height  of  popularity  during  the 
i&SO's.  A  good  general  treatment  of  information  theory  can  be  found  in  Sheridan  and  Ferrell  (1974). 
Applications  of  information  theory  in  psychology  can  be  found  in  Attneave  (1959)  and  Gamer  (1S62). 
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Information  theory  provides  a  metric  of  the  transmission  nf  Information  through  an  Imperfect 
communication  channel.  The  metric  Is  stated  in  terms  of  the  log  (base2)  of  the  number  of  alternatives 
weighted  by  their  probabilities  of  occurrence.  Information  transmission  Is  a  reduction  In  the  number  of 
alternatives  which  is  expressed  as  a  reduction  of  uncertainty.  Two  alternatives  which  contain  common 
Information  are  said  to  be  redundant.  The  channel  Imperfections  are  defined,  for  example,  os  noise  and 
limits  of  channel  capacity  which  result  In  lost  informatlori  (equivocation). 

One  of  the  first  applications  of  Information  theory  to  the  workload  domain  was  that  of  Senders  (1964). 
In  this  application,  a  model  was  used  to  describe  the  division  of  attention  by  an  operator  monitoring 
information  displays.  It  assumed  that  an  operator,  wHh  a  Smtted  Input  channel  capacity,  sampled  each 
information  display  at  a  frequency  necessary  to  reconstruct  the  signal  being  presented  on  that  display 
within  specific  error  tolerances.  The  amount  of  time  spent  samplng  each  Instalment  is  summed  over  all 
instruments  to  determine  the  fraction  of  the  operator's  time  that  must  be  spent  observing.  This  timo 
traction  is  used  as  a  measure  of  visual  workload  imposed  by  the  information  displays. 

The  use  of  information  theory  in  the  analysis  and  estimation  of  workload  has  been  limited.  Despite 
some  efforts  (e.g  ,  Crawford,  1979;  Rault,  1976),  applications  in  realistically  complex  environments  are 
difficult  to  achieve  due  to  the  necessity  of  a  priori  establishment  of  the  relevant  simple  and  conditional 
stimulus  and  response  probabilities.  Because  information  theory  provides  output  with  respect  to  steady- 
state  situations,  it  is  not  well  suited  for  representing  dynamic  changes  in  workload.  The  impact  of 
information  theory  is  probably  most  strongly  felt  through  the  adoption  of  its  concepts  such  as  limited 
channel  capacity,  information  transmission,  redundancy,  and  other  concepts  now  contained  in  information 
proce  ssing  approaches  to  behavior  (Qamer,  1974). 

Queuing  Theory  Models 


Queuing  theory  models  of  human-machine  interaction  characterize  the  operator  as  a  single-channel 
processor  sharing  attentional  resources  serially  among  a  variety  of  tasks.  The  human  is  conceptualized  as 
a  "server*  processing  multiple  tasks  and  "server  utilization"  or  "busyness"  is  used  as  a  measure  of 
workload.  These  models  generally  apply  to  situations  in  which  performance  times  are  critical.  Within 
queuing  theory,  performance  times  include  both  the  time  it  takes  to  execute  various  tasks,  as  well  as  the 
time  that  tasks  must  wait  before  being  performed.  Rouse  (1980)  provides  a  good  discussion  of  queuing 
theory  and  its  application  to  human-machine  modelng. 

The  emphasis  in  queuing  models  is  more  on  when  tasks  are  performed  rather  than  how  they  are 
performed.  As  indicated  by  Rouse,  these  models  are  most  appropriate  in  multitask  situations  in  which  the 
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operator  must  cope  with  task  priorities  and  with  performance  requirements  that  vary  among  the  tasks. 
Using  Jahns'  (1973)  categorization  of  workload  (Chapter  2),  queuing  theory  models  are  concerned 
primarily  with  the  input  load  to  the  operator.  A  benefit  of  queuing  models  is  that  fractional  attention  is 
computed  as  a  function  of  time  and  system  performance  dynamics  are  taken  into  account. 

The  queuing  theory  approach  to  workload  estimation  is  generally  considered  in  conjunction  with 
Senders'  analysis  of  monitoring  tasks  (e.g.,  Senders,  Elkind,  Grignetti.  &  Smallwood,  1966;  Senders  & 
Posner,  1976).  However,  others  such  as  Schmidt  (1978),  analyzing  the  workload  of  air  traffic  controllers, 
and  Waiden  and  Rouse  (1978),  modeling  pilot  decision  behavior,  have  also  successfully  applied  this 
approach. 

Other  MethmmhcalMotim 

The  above  sections  have  suggested  the  major  applications  of  mathematical  models  to  predicting 
workload.  However,  a  variety  of  other  morieRrig  approaches  have  been  proposed,  but  have  had  limited 
use  in  a  workload  context.  For  example,  Moray  (1976)  discussed  the  use  of  Signal  Detection  Theory. 
Signal  defection  (revolves  asking  a  subject  to  detect  signals  imbedded  in  noise.  Detection  of  a  signal 
when  present  is  a  true  positive  (correct)  and  detection  of  a  signal  when  none  was  presented  is  a  false 
positive,  (error).  By  varying  the  probability  of  a  signal  actually  present,  it  Is  possible  to  generate  receiver 
operating  curves  (ROC)  which  indicate  both  true  signal  detection  end  subject  bias  fcr  false  positives. 
Signal  Detection  analogues  have  been  developed  and  used  within  optimal  control  theory  (Levison  & 
Tanner,  1971 );  this  applcatton  may  be  useful  for  predating  OWL. 

Finally,  White,  MacKinnon  and  Lyman  (1965)  have  outlned  a  model  based  on  a  modified  Petri  net 
system  for  workload  estimation  and  prediction.  The  work  was  am  attempt  to  demonstrate  that  the  model 
was  sensitive  to  workload  manipulations  and  achieved  promising  results.  However,  the  predictive 
capability  of  the  model  is  stii!  to  be  demonstrated. 

Summery 

The  application  of  manual  control  theory  to  workload  estimation  and  prediction  is  generally  restricted  to 
environments  involving  continuous  controlling  tasks.  During  that  period  when  workload  was  practically 
synonymous  with  vehicular  control,  manual  control  models  were  easily  the  most  interesting  and  promising 
class  of  techniques  providing  predictions  to  system  designers.  In  the  present  day,  these  models  may  be 
adapted  to  estimate  measures  generally  associated  with  OWL,  but  the  mathematical  sophistication 
required  to  develop  or  even  understand  the  models  limits  their  applicability.  Detailed  system  parameters 
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must  also  be  provided  io  exercise  these  models  fully;  these  parameters  are  frequently  not  available  during 
early  concept  development.  Consequently,  manual  control  models  are  generally  not  viable  for  most 
conceptual  system  evaluations. 

'fir*  popularity  of  mathematical  models  seems  to  have  waned.  Information  theory  was  most  popular  in 
the  1960's  and  manual  control  theory  and  queuing  theory  predominated  during  the  1970's.  Although 
many  of  these  models  have  experienced  considerable  success  within  the  domain  for  which  they  were 
intended,  they  seem  to  have  been  supplanted  in  the  1980'e  by  computerized  task  analysis  and 
simulation  models.  A  major  problem  with  mathematical  modeling  le  the  absence  of  explicitly  defined 
workload  parameters.  Thus,  while  model  outputs  may  Identify  and  quantify  particularly  busy  periods  within 
a  given  time  sice,  or  particularly  high  periods  of  infoimation  transfer,  It  is  never  quite  dear  how,  or  if,  these 
phenomena  relate  to  high  workload.  This  observation,  it  should  be  pointed  out,  is  not  restricted  to 
mathematical  models  alone  and  probably  has  relevance  to  most  analytical  techniques  and  methodologies. 

There  is  always  a  place  for  a  useful  mathematical  model,  even  if  the  model  is  not  as  broad  as  one  would 
like.  An  obvious  and  hopeful  evolution  would  be  that  certain  of  these  mathematical  models,  espedally  the 
optimal  control  model  which  can  cover  aspects  of  queuing  formulations,  might  be  incorporated  into  the 
simulation  models.  It  would  certainly  seem  feasible  to  bring  such  models  Into  simulations  in  a  form  which 
more  people  could  use. 


Tart  Analysis 

Tajik  analysis  techniques  have  a  tong  history  (Drury  at  al.,  1987)  and  are  the  most  commonly  used  of  all 
analytical  tools  for  predicting  workload  In  the  prelminary  design  process.  This  Is  partly  due  to  the  military 
requirement  for  a  task  analysis  to  be  performed  during  system  development  (MIL-H-4S855B).  It  is  a  fairly 
natural  extension  from  this  requirement  to  derive  OWL  estimates  from  the  task  analysis. 

Task  analysis  methods  seek  to  produce  operator  performance  requirements  as  a  function  of  fixed 
increments  of  time  defined  against  a  scenario  background.  The  basic  task  analysis  process  begins  with 
definition  of  a  mission  scenario  or  profile.  Next,  the  general  mission  requirements  are  systematically 
decomposed  into  mission  segments,  functions,  and  operator  tasks;  the  tasks  In  turn  are  decomposed  into 
detailed  operator  task  element  requirements.  These  elemental  task  requirements  are  defined  as  operator 
actions  required  to  complete  the  task  within  the  context  of  the  system  characteristics.  Thus,  the  timing 
and  sequencing  of  operator  actions  will  depend  on  the  nature  and  layout  of  controls  and  displays.  The 
result  of  the  analysis  Is  an  operator  activity  profile  as  a  function  of  mission  time  and  segment,  essentially  a 
time-based  analysis  of  performance  requirements. 
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A  natural  consequence  ot  time-based  task  analysis  Is  to  define  OWL  operationally  as  tin**  stress.  Time 
stress  is  expressed  as  a  ratio  ot  Time  required  (Tr)  to  perform  a  task  over  the  Time  available  (Ta),  yielding 
Tr/Ta.  Workload  situations  ot  concern  are,  therefore,  those  which  cause  the  operator  to  approach  the 
edges  of  the  performance  envelope,  that  is  Tr/T a  approaches  1 .0.  This  definition  encompasses  only  one 
aspect  of  workload:  time  stress.  A  technique  incorporating  such  a  definition  Is  useful,  but  probably  best 
utilized  as  an  initial  coarse  filter  to  identify  gross  design  deficiencies  and  for  cases  In  which  the  time 
required  for  a  task  is  well  defined.  Diagnostidty,  In  the  time-Kne  technique,  is  limited  to  identifying  general 
functional  Imitations  where  demands  exceed  operator  capacity  to  respond  within  some  time  frame. 

Other  approaches  are  more  detailed  in  the  analysis  of  tasks,  further  partitioning  them  into  components 
relevant  to  sensory  channel  or  body  part  (e.g.,  eyes,  ear,  hand,  foot,  etc.).  Recent  methods  have 
included  a  still  more  detailed  analysis  structure  in  an  attempt  to  identify  types  of  cognitive  loads  imposed 
on  the  operator.  However,  these  more  detailed  approaches  still  typically  contain  time  stress  (Tr/Ta)  as  a 
major  contributor  in  the  estimation  of  workload.  Nevertheless,  diagnostidty  improves  by  virtue  of 
identification  of  specific  components  that  may  be  overloaded. 

There  are  many  variations  on  the  basic  task  analysis  structure.  The  differences  will  be  clarified  in  the 
discussions  of  each  of  the  methods.  The  models  presented  herj  are  intenued  to  be  illustrative  of  the 
class  of  information  that  can  be  integrated  into  the  models  and  the  nature  of  the  results  that  can  be 
obtained.  A  review  of  many  task  analysis  techniques  may  be  found  in  Meister  (1985). 

rkm-Based  Task  Analysis  Procedures 

Tlmsllno  Task  Analysis.  A  recent  application  of  the  timeline  analysis  technique  employing  the 
Tr/Ta  metric  is  that  described  in  Stone,  Gulck  and  Gabriel  (1987).  They  used  this  technique  to  identify 
workload  with  respect  to  specific  sensory-motor  channels  encountered  in  overall  aircraft  operations. 
Validation  efforts  are  reported  by  the  authors,  with  the  results  indicating  that  the  procedure  "...provides  a 
reasonably  accurate  index  for  predicting  the  time  required  to  complete  observable  tasks  within  the 
constraints  of  an  actual  mission." 

Workload  Aaaaammn  Modal  (WAM).  The  Workload  Assessment  Model  was  introduced  as  part  of 
a  more  comprehensive  human-machine  system  design  aid,  Computer  Aided  Function-Allocation 
Evaluation  System  (CAFES).  WAM  is  intended  to  estimate  the  effects  of  alternate  function  allocations  on 
OWL  (Edwards,  Cumow,  &  Ostrand,  1977).  In  WAM,  a  mission  timeline  is  developed  which  indicates  what 
tasks  are  performed  during  the  mission  and  in  what  sequence  they  are  performed.  The  individual 
sensory-motor  channels  (e.g.,  eyes,  ears,  hands,  feet,  etc.)  that  are  involved  in  the  execution  of  each  task 


49 


are  Want  tiled.  WAM  computes  the  channel  utilization  percentage  Including  the  amount  ot  time  that  each 
channel  Is  oocupied  within  a  specific  time  segment.  Percentages  over  a  specified  threshold  level  are 
considered  excessive,  and  Identify  either  function  allocation  deficiencies,  design  inadequacies,  or  both. 

A  variant  of  WAM,  the  Statistical  Workload  Assessment  Model  (SWAM),  allows  shifting  excessive 
workload  tasks  in  time  in  an  attempt  to  reduce  the  workload  level.  This,  in  effect,  is  a  rescheduling  of  tasks 
to  reduce  time  stress.  Linton,  Jahns,  and  Chatelier  (1977)  report  one  application  of  SWAM.  They 
examined  a  conceptual  VF/VA-V/STOL  aircraft  to  determine  whether  a  single  pilot  could  manage  the 
aircraft  and  its  avionics  subsystems  In  defined  mission  phases.  The  results  Indicated  the  potential  single¬ 
pilot  operability  for  the  aircraft,  but  did  not  establish  any  validity  measures  for  the  assessment  technique. 

TTre  rtma-Baaad  Analyala  ot  Significant  Coordlnatad  Opamtfona  (TASCO).  TASCO  analyzes 
tactical  mission  cockpit  workload  using  the  standard  time-based  approach  (Roberts  &  Crites,  1985;  Ellison 
&  Roberts,  1985),  The  basic  analytical  component  of  the  method  Is  the  EDAM  (Evaluation,  Decision, 
Action,  and  Monitoring)  loop.  Evaluation  takes  into  account  the  Impact  oi  Information  display  design.  The 
decision  is  made  by  the  pilot  based  on  training,  experience,  tactical  doctrine  and  situatk  al  awareness 
appliea  to  the  evaluation  of  the  data  displayed.  The  decision  results  in  an  action  via  the  cockpit  controls 
which  is  then  monitored  to  evaluate  the  outcome  of  the  action. 

Two  types  of  analysis  are  performed  In  TASCO.  The  first  is  crewstation  task  analysis,  which  Is  a  design 
evaluation  performed  by  an  SME  using  a  5  point  rating  scale  to  judge  design  elements  that  are  especially 
crucial  to  mission  performance.  The  second  is  a  Busy  Rate  Index  analysis,  which  is  essentially  a  Tr/Ta 
estimate  over  a  set  time  Interval.  How  the  above  mentioned  EDAM  loops  are  integrated  Into  these 
analyses  is  unclear,  as  is  the  current  state  of  development  of  '*•#  TASCO  model. 

Computarlzad  Rapid  Analyala  ot  Workload  (CRAWL).  CRAWL  involves  expert  opinion 
superimposed  upon  a  task  analysis  background  with  two  basic  sets  of  inputs  (Bateman  &  Thompson, 
1986;  Thompson  Ik  Bateman,  1986).  The  first  set  of  inputs  includes  task  descriptions  generated  by 
SMEs  on  the  proposed  system  under  study,  along  with  SME-generated  workload  ratings  for  four  separate 
channels  -  visual,  auditory,  cognitive  and  psychomotor.  Additionally,  the  average  time  for  task  completion 
and  a  short  verbal  description  of  each  task  are  included.  The  second  set  of  inputs  contains  timing 
information,  including  the  starting  time  tor  each  occurrence  of  each  task  executed  during  the  mission 
segment.  Overall  workload  for  each  time  segment  is  computed  by  summing  the  workload  ratings  for  the 
four  channels. 

In  an  effort  to  validate  CRAWL,  workioad  estimates  obtained  while  operators  flew  a  single  seat  simulator 
were  compared  to  CRAWL  predictions  of  workioad  for  six  combat  mission  scenarios.  Overall,  an  average 
correlation  of  0.74  was  found  between  the  predicted  workload  levels  and  pilot  subjective  workload  ratings 
obtained  during  the  simulation  study.  The  correlation  l.idicates  good  agreement  between  the  two 
measures. 
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Worfrioml  man  (W/1NDEX).  W/INDEX  combine*  mission,  task,  and  timelne  analyse*  with  theories 
of  attention  and  human  perfonnanca  to  predict  attentions  demands  in  a  crew  station  (North,  I9d6).  It 
d/flers  from  other  task  analytical  techniques  by  providing  estimates  of  the  effect  of  time-sharing  loads 
imposed  by  concurrent  task  demands.  W/INDEX  estimates  workload  demands  io>  one-second  segments 
baseo  on  individual  task  difficulty  and  time-sharing  deficits. 

W/iNDEX  operates  on  the  following  data: 

•  Grewstation  Interlace  channels, 

•  Human  activity  1st, 

•  Attention  Involvement  levels, 

•  Interlace  conflict  matrix,  and 

•  Operator  activity  timelines. 

W/INDEX  was  applied  to  three  different  conceptual  cockpit  designs  and  was  demonstrated  to  be  sensitive 
to  design  changes  although  apparently  not  validated  against  empirical  studies. 

TmMcCnckm-AkMchAppevech 

McCracken,  Aldrich,  and  their  associates  have  recently  developed  a  task  analysis  approach  for 
predicting  OWL  that  does  not  rely  solely  on  the  time-based  definition  of  workload  (McCracken  &  Aldrich, 
1984;  AlUrich,  Craddock  &  McCracken,  1984;  Aldrich  &  Szabo,  1986).  These  authors  attempted  to 
improve  the  diagnostidty  of  workload  predictions  by  identifying  four  (and  later,  five)  behavioral  dimensions 
which  contribute  to  overall  workload  levels.  They  were  also  among  the  first  to  isolate  expticitty  cognitive 
workload  demands.  This  approach  has  impacted  other  task  analysis  methods  (e.g.,  CRAWL  described 
above)  and  simulation  methods  (e.g.,  Micro  Saint,  described  below). 

The  McCracken-Aldrich  methodology  involves  performing  mission  and  task  analyses  that  generate  a 
rough  timeline  (i.e.,  one  without  a  strict  time  scale)  of  operator  tasks.  These  tasks  are  further  partitioned 
into  elemental  task  requirements  which,  based  on  system  characteristics,  are  used  to  generate  estimates 
of  workload  for  up  to  five  workload  dimensions  (Szabo  ei  al.,  1987): 

•  cognitive, 

•  visual, 

•  auditory, 

-  kinesthetic,  and 
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psychomotor. 


Workload  assessments  are  mad*  by  assigning  numerical  ratings  tor  osch  ot  the  applicable  workload 
components.  These  ratings  represent  tho  difficulty  or  effort  associated  with  performing  the  task.  It  is  in 
the  ratings  that  this  technique  differs  most  from  other  task  analytes  The  ratings  ere  generated  by 
compering  verbal  descriptors  of  the  task  elements  with  the  verbal  anchors  identified  with  each  scale  value. 
The  five  workload  components  are  assigned  scale  values  of  one  through  seven  (Szabo  et  si.,  1987).  For 
example,  during  the  post  mission  checklist  of  a  helicopter,  the  oopNot  performs  the  task  of  inspecting  the 
exterior  of  the  aircraft.  That  task,  In  turn,  requires  that  the  copilot  "visually  inspect  each  side  ot  the 
airframe"  (visual  scale  value  «  2)  and  "evaluate  the  current  status  ot  the  airframe  tor  damage"  (cognitive 
scale  value  »  2).  The  scale  and  verbal  anchors  for  the  cognitive  component  are  presented  for  Illustrative 
purposes  in  Table  3-3. 

Estimates  of  the  duration  of  each  task  element  ultimately  are  developed  to  construct  a  detailed  task 
timeline  using  one-half  second  intervals.  Total  workload  Is  estimated  by  summing  across  concurrent  task 
elements  for  each  workload  component,  visual,  auditory,  cognitive,  kinesthetic,  and  psychomotor,  during 
each  time  interval.  If  this  sum  exceeds  a  threshold  value,  e.g.,  7  on  visual,  then  the  operator  Is  assumed  to 
be  overloaded  on  the  component.  The  frequency  of  overloaded  Intervals  for  each  mission  segment  can 
then  be  determined  and  the  causative  workload  component  identified. 


Table  3-3.  Cognitive  workload  component  scale  (McCracken  &  Aldrich,  1984). 


Scale  Value 

Verbal  Anchors 

1 

Automatic,  simple  association 

2 

SigrVsigna!  recognition 

3 

Alternative  selection 

4 

Encoding/decoding,  recall 

5 

Formulation  of  plans 

6 

Evaluation,  judgement 

7 

Estimation,  calculation,  conversion 

Hamilton  and  Harper  (1984)  proposed  a  modification  of  the  McCracken- Akj rich  technique.  Their  variant 
replaces  the  summation  method  of  workload  estimation  with  an  interference  matrix  approach  for  detailed 
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workload  analysis.  This  matrix  defines  acceptable,  marginal,  nr*  unacceptable  workload  levels  for  each  of 
four  workload  oompononts.  A  aeries  of  dodslon  rules  ore  then  employ  ad  to  define  whether  or  not  entire 
mission  segments  have  acceptable,  marginal,  or  unacceptable  workload  leveic.  This  technique  alleviates 
certain  interpretive  problems  ooncemlng  the  Implanon  of  having,  -or  example,  a  total  mission  segment 
rating  of  10  on  visual  tasks  with  a  soils  range  of  only  one  to  seven,  vacation  efforts  with  this  technique 
indicated  that  it  la  sensitive  to  task  dtfferences  and  reflected  empirical  pilot  opinion  ratings  obtained  in 
simulation  studies.  It  was  also  found  to  predc!  sightly  higher  workload  rating#  then  these  obtained  by  the 
empirical  rating;  this  bias  may  be  desirable  for  design  purposes. 

Cognac  TtakArmfytit 


The  idea  that  a  more  detailed  task-analysis  structure  can  provide  increased  diagno  sticky  Is  an  Important 
one.  Combining  this  idea  with  the  fact  of  Increased  Influence  of  cognitive  tasking  leads  to  the  approach  ot 
detailed  decomposition  of  cognitive  workload  into  oomponent  types.  This  approach  has  been  developed 
and  applied  to  selected  aircraft  systems  (Zachary,  1981).  As  In  more  traditional  task  analysis,  operator 
tasks  are  decomposed  and  are  grouped  into  four  primary  categories:  cognitive,  psychomotor,  motor,  and 
communicatlve/lntemctional.  A  mission  scenario  is  independently  developed  with  a  vari  able  timeline  grain 
depending  on  mission  segment  (for  example,  in  attack  mission  segment  may  be  decomposed  to  second 
by  second  events  whereas  a  retum-to-base  segment  may  be  decomposed  into  five  minute  intervals). 
Operational  personnel  then  work  with  cognitive  scientists  to  map  operator  tasks  onto  the  scenario 
timeline.  Next,  workload  Gavels  are  assigned  to  each  operator  task  as  the  scenario  unfolds.  Workload 
ratings  for  the  same  task  may  vary  depending  on  the  mission  segment  in  which  It  is  performed. 

In  particular,  the  workload  analysis  is  based  on  a  set  of  workload  rating  scales  that  describe  five  distinct 
types  of  cognitive  workload: 

•  planning  difficulty, 

•  prediction  difficulty, 

•  calculation  dkficufty, 

•  information  processing  oompienity,  and 

•  information  absorption  complexity. 

In  addition,  eight  other  workload  scales  are  utilized  in  the  categories  of:  psychcmotor  (pointer  movement 
and  writing),  motor  (button-pushing  frequency  and  keyset  entry  frequency),  and  interactional  (interruption 
frequency,  interruption  magnitude,  communication  frequency,  and  communication  complexity). 
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Applications  of  this  methodology  for  each  time  segment  yields  Individual  ratings  on  thirteen  scales  and 
averaged  ratings  for  the  four  categories  (cognitive,  motor,  psychomotor,  and  Interactional),  as  well  as  an 
overall  workload  (average  of  13  measures).  This  promising  methodology  has  been  recently  applied  to  two 
systems  -  the  F-3C  inti- submarine  warfare  tactical  coordinafor  (Zaklad,  Daimler,  lavecchia,  &  Stokes, 
1982)  and  the  F/A-18  single-seat  aircraft  (Zachary,  Zaklad,  &  Davis,  1987).  Little  formal  validation  has  as 
yet  been  aocompHshed,  although  the  effort  Is  still  ongoing. 

Summary 

Task  analysis  has  demonstrated  high  uti»y.  The  n  workload  within  the  various  teak  analyses 

are  not  complete,  but  being  based  principally  on  rime,  they  are  clearly  closely  related  to  perceived  OWL. 
Indeed,  the  criteria  lor  most  tactical  missions  contain  a  temporal  component  in  the  measure  of 
effectiveness  (MOB).  And  It  is  true,  if  a  task  cannot  be  done  within  the  time  requirements,  of  what 
importance  is  accuracy?  For  those  situations  in  which  time  required  (Tr)  is  estimated  to  be  near  or 
approaching  the  performance  envelope  boundaries  (Ta),  additional  evaluations  can  and  should  be 
performed  to  identify  OWL  components  which  may  be  adversely  affecting  performance  time. 


Skriutatton  Models 

The  application  of  simulation  models  to  the  workload  estimation  problem  is  conceptually  an  extension 
of  the  traditional  operator-in-the-ioop  simulation  procedure.  The  major  difference,  ot  course,  is  that  the 
simulation  uriort  is  expanded  to  include  a  simulated  operator.  Similarly,  simulation  may  be  considered  an 
extension  of  tar  k  analysis.  Within  simulation  models,  differences  among  the  models  include:  (a)  whether 
operator  char  (rieristics  must  be  defined  along  with  system  and  environmental  characteristics  or  (b) 
whether  the  operator  model  is  included  as  part  of  the  overall  simulation  model.  Meister  (1985)  and  Chubo, 
Laughery  and  Pritsker  (is>87)  review  simulation  models  and  their  applications. 

Good  descriptions  of  the  operator,  system  and  operational  environment  are  the  first  prerequisites. 
Given  such  a  model,  the  problem  remains  to  define  an  appropriate  workload  index  that  can  be  used  to 
compare  differences  across  tactical  missions,  system  configurations  or  operational  uses.  In  most 
instances,  a  task  toating  index  such  as  time  required/Ume  available  is  used.  Furthermore,  some  simulation 
models  can  predict  not  only  operator  workload,  which  itself  may  or  may  not  affect  system  performance,  but 
also  system  perfo  mance  for  future  comparison  with  empirical  measures  of  effectiveness  (MOEs). 


54 


StmutsUon  vs.  Tnak  Atmtysit 


The  distinction  between  the  task  analysis  methods  and  the  computer  simulation  methods  is  not  always 
clear.  Simulation  models  have  been  described  as  elaborated  task  analysis  methods  with  consideration  of 
the  statistical  nature  of  constituent  elements.  Most  computer  simulation  models  employ  a  tr.sk  analysis  as 
part  of  the  development  effort,  and  most  task  analytical  methods  are  now  computerized.  The  basic 
distinction  that  is  intended  In  this  categorization  Is  that  the  task  analysis  methods  produce  operator 
performance  requirements  as  a  function  of  fixed  Increments  of  time  defined  against  a  scenario 
background.  Simulation  models,  in  contrast,  attempt  to  represent  (simulate)  operator  behavior  statistically 
within  the  system  under  study  and  produce  measures  of  effectiveness  for  human-system  performance,  in 
other  words,  running  a  computerized  task  analysis  twice  would  yield  identical  answers.  Running  a 
simulation  model  twice  would  not  necessarily  yield  the  same  resuits  due  to  different  consequences  of 
branching  statements  and  statistical  modification  of  task  times  and,  where  appropriate,  performance 
accuracies. 

Typeset  Models 

Recently,  Sticha  (1987)  has  discussed  two  general  types  of  models  to  simulate  human  performance. 
According  to  Sticha,  the  difference  between  these  two  existing  classes  can  be  stated  in  terms  of  the  ways 
in  which  the  control  of  sequencing  of  the  behaviors  is  accomplished.  The  first  of  these  is  a  network 
model.  This  approach  controls  the  order  directly  in  a  network  by  means  of  tire  way  the  analyst  has 
developed  the  procedures  -  order  is  defined  in  the  procedure.  Network  models  are  a  combination  and 
amalgamation  of  a  number  of  techniques:  flowcharts,  program  evaluation  and  review  technique  (PERT), 
Markov  models,  decision  trees,  and  reliability  models.  The  second  method  of  simulation  is  the  production 
rule  approach.  Production  models  control  the  ordering  through  a  set  of  production  rules  and  through 
these  rules  by  the  environment.  Sequencing  is  indirectly  inferred  by  a  set  of  rules  which  associate  a 
behavioral  action  with  an  environmental  event.  The  actions  are  performed  only  when  the  environmental 
conditions  of  the  rules  have  been  met  -  order  is  thus  defined  by  the  environment.  There  are  no  true 
production  models  used  in  workload,  however,  there  are  several  hybrid  models  employing  both  the 
network  and  the  production  rule  approach.  Sequiturs  Workload  Analysis  System  (SWAS)  and  the  Human 
Operator  Simulator  (HOS)  are  examples  of  hybrid  models.  Although  both  classes  of  models  may  in  some 
situations  produce  identical  results,  they  have  different  capabilities,  in  particular,  Sticha  points  out  that 
procedural  tasks  are  characterized  by  internal  control  whereas  tasks  involving  the  recall  and  application  of 
rules  are  driven  by  the  environment. 
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SWgif  WnOm  MwnUr 


The  majority  of  sinuiation  modal*  are  derivatives  of  the  network  model  developed  by  Siegel  and  Wolf 
(1080).  Siegel  and  Wolf  models  come  In  several  variants  Involving  the  number  of  operators  simulated. 
The  basic  purpose  ol  the  models  is  to  provide  an  indication  to  developers  about  where  in  a  proposed 
system  the  operators  may  be  over-stressed  or  under-  stressed.  The  models  predict  task  completion  times 
and  probabilities  of  successful  task  completion.  The  variable  that  relates  to  workload  ip  termed  stress. 
Stress  is  caused  by: 

•  fating  behind  in  time  on  task  sequence  performance, 

•  a  realization  that  the  operator's  partner  is  not  performing  adequately, 

•  the  inability  to  successfully  complete  a  task  on  the  first  attempt  with  the  possible  need 
for  repeated  attempts,  or 

•  the  need  to  waif  for  equipment  reactions. 

Both  time  and  quantity  of  tasks  enters  into  the  stress  definition.  Note,  however,  that  task  quantity  can  be 
reduced  to  time.  Stress  is  typically  calculated  as  the  ratio  of  the  sum  of  the  average  task  execution  times  to 
the  total  time  available.  A  task  difficulty  factor  has  been  included  in  recent  model  developments  (Meister, 
1985). 

Input  to  the  network  model  i/picalty  consists  of  11  date  items  for  each  subtask  and  operator  (Meister, 
1985).  These  are  shown  in  Table  3-4.  There  are  many  sources  of  the  necessary  data,  including  detailed 
task  analysis,  but  the  major  source  is  direct  questioning  of  subject  matter  experts  (SMEs).  Trie  type  of 
data  input  is  usually  not  sensitive  to  design  changes  within  a  specific  type  of  system  component  (e  g., 
dials),  but  can  differentiate  between  different  types  of  components  (e.g.,  dials  vs.  status  lights).  Model 
outputs  include  a  number  of  performance  measures  such  as  number  of  runs,  average  run  time,  number 
and  percent  of  successful  runs,  average,  peak,  and  final  stress,  and  several  others.  The  primary  uses  for 
these  models  ar a  for  the  coarse  prediction  of  system  effectiveness  and  design  analysis.  Sienei-Wolf 
models  are  typically  used  for  discrete  task  modeling. 

SAINT/Micro  SAINT 


An  important  extension  of  the  Siegel-WoH  model  is  called  the  System  Analysis  of  Integrated  Networks 
ct  Tasks  (SAINT).  SAINT,  along  with  its  microcomputer  version  Micro  SAINT,  is  actually  a  task  network 
simulation  language.  It  contains  a  number  of  process  branching  rules,  multiple  distributions  for  modeling 
individual  task  operations,  and  a  Monte  Carlo  sampling  procedure  for  determining  task  execution.  As  a 


general  purpose  simulation  language,  it  provides  a  framework  and  contains  tittle  impHctt  formation  toward 
a  developed  model.  This  means  that  operator,  system,  and  environmental  characteristics  must  be 
entered  by  the  mowler.  Micro  SAINT  provides  a  menu-driven  interface  to  facilitate  this  development 
effort  raimtr  underlying  approach  to  estimating  iMortdoad  la  the  same  os  the  Siepet-Wolf  models.  SAINT 
defines  stress  as  the  ratio  ot  time  required  to  complete  a  task  to  the  time  available  (Tr/Ta).  SAINT  can  be 
used  to  model  both  discrete  and  continuous  tasks. 


Table  3-4.  The  eleven  data  elements  required  for  each  subtask  and  operator  for  Siegol-Wotf  Models 
(from  Meister,  1985,  p.  125). 


1 .  Decision  subtasks, 

2.  Non-essential  subtasks, 

3 .  Subtasks  which  must  be  completed  before  it  can  be  attempted  by 
another  operator, 

4.  Time  before  which  a  subtask  cannot  be  started, 

5 .  The  subtask  that  must  be  performed  next, 

6 .  Average  task  duration  in  seconds. 

8 .  Average  standanf  deviation  of  task  duration, 

9.  Probability  ot  being  successful, 

1 0 .  Tims  required  for  all  remaining  essential  tasks,  and 

1 1 .  Time  required  for  all  remaining  non-essential  tasks . _ _ 


Micro  SAINT  has  been  used  in  conjunction  with  a  separate  workload  estimation  methodology. 
Laughery  et  ai  (1986)  used  Micro  SAINT  to  predict  OWL  in  four  alternative  helicopter  cockpit  designs 
using  a  model  which  incorporated  characteristics  ot  the  operator,  a  helicopter,  and  the  threat  environment 
as  task  networks.  OWL  was  assessed  during  the  Micro  SAINT  simulation  folkiwinr  the  technique 
developed  by  McCracken  and  Aldrich  (1984).  The  use  nf  the  McCracken-Aldrich  task  analysis  required 
the  assignment  ot  workload  requirements  for  each  of  five  workload  components  -  auditory,  visual, 
cognitive,  kinesthetic,  and  psychomotor  dimensions  -  for  each  operator  activity.  Thus,  each  task  is 
characterized  by  its  requirements  for  each  of  the  components.  Overall,  workload  could  then  be  assessed 
for  tasks  executed  individually  or  In  combination  if  executed  concurrently.  Workload  was  assessed  at  2- 
second  intervals  in  order  to  track  it  through  the  simulated  mission  scenario.  The  results  demonstrated  that 
the  methodology  was  sensitive  to  variations  among  the  helicopter  designs,  and  that  specific  components 
overloads  could  be  identified.  Tho  authors  report  that  total  development  and  execution  time  was  on  the 
order  of  10  weeks,  although  subsequent  development  times  can  be  sub-nartiaily  less.  This  integration  of 
network  simulation  with  more  robust  and  diagnostic  workload  prediction  methodologies  is  a  promising 
development. 
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QtoimdmHnn  fgf  \ HftOjUpg/g  ffttf 

Another  related  simulation  methodology  la  oaHaf  tha  Simulation  for  Workload  Assessment  and 
Manning  (SIMWAM)  (Kirkpatrick.  Malona  A  Andrew*,  ISM).  SIMWAM  9a  bawd  on  SAINT  and  tha 
Woridoad  Assessment  Model  (WAM)  (Edwards,  Comow,  A  Oatrand,  1977),  bm  has  been  developed  to 
make  It  especlaty  suitable  tor  examining  manpower  issues,  as  w*8  as  tadMdual  operator  workload,  m 
complex  multi-operator  systems.  SIMWAM  has  been  used  to  assess  workload  and  manpower  issues  for 
an  aircraft  carrier's  aircraft  operations  management  system  (Mabne,  Kirkpatrick  A  Kopp,  1986). 
Specifically,  the  SIMWAM  application  focused  on  the  effects  of  Incorporating  an  automated  status  board 
(ASTAB)  Into  the  existing  system.  The  scenario  Involved  S5  shipboard  operators  engaged  In  the  launch- 
recovery  cycle  of  25  airc>  jft.  Two  woridoad  eaeeesmertt  were  made:  one  on  the  existing  basefine  system 
and  another  with  the  proposed  ASTAB.  The  results  of  ths  analysis  irtc&caied  that  the  introduction  of 
ASTAB  would  allow  a  reduction  in  the  number  of  required  personnel  by  four  individuals.  That  conclusion 
was  based  on  the  woridoad  having  been  reduced  to  near  ;tem  for  these  four  IndMduats,  where  woridoad 
was  defined  by  number  of  tasks  they  performed  and  ttie  amount  of  time  that  they  were  busy  (i.e., 
occupied  with  tasks).  Also,  the  number  of  operators  who  were  heavily  loaded  (i.e.,  busy  at  least  75%  of 
the  time)  was  reduced  by  one  half.  Thus,  SIMWAM  provides  a  basis  for  predicting  the  Impact  on 
manpower  requirements  of  proposed  system  modHfcaticifs.  Such  results  are  especially  meaningful  to 
program  managers. 

SmjuHun  Wortdomi  Analysis  Sytttm  (SW AS) 

Sequiturs  Workload  Analysis  System  Is  a  hybrid  model  incorporating  features  of  both  types  of  models, 
network  and  production  techniques,  as  discussed  in  the  introductory  section  on  simulation  models. 
(Holley  A  Parks,  1987).  In  contrast  to  the  network  models  discussed  above  which  are  performance 
simulation  tools,  this  model  has  been  developed  specifically  for  workload  analysis.  The  definition  of 
woridoad  is  the  by  now  familiar  time  required  over  time  available  (Tr/Tai.  Success  is  defined  strictly  in  terms 
of  the  Tr/Ta  ratio. 

SWAS  contains  a  structured  helicopter  task  database,  organ 'zed  according  to  task  categories  which  in 
turn  are  broken  into  task  blocks  containing  task  elements.  (This  task  analysis  follows  requirements  in  Mll- 
H-46855B.)  Each  task  element  in  the  database  hat  tan  attributes  including  the  mean  time  and  standard 
deviation,  and  differentiation  of  discrete  and  continuous  tasks.  It  also  has  built  in  assumptions  about  the 
organization  and  functioning  of  behavior,  to  lowing  tha  Wickens  (1964)  resource  model.  This  model  piays 
a  major  role  in  the  organization,  sequencing,  snd  resource  time-sharing  of  task  elements  as  well  as 
modification  of  performance  times.  (See  Navon  (1984)  for  a  critical  review  of  the  resource  model.) 
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A&MonaNy,  SWAS  contains  a  Mathods  Tim*  Measurement  (MTM  module  which  1*  used  to  attic!  the 
umt  in  producing  mean  performance  timet.  Rnaly,  equations  am  twHt  to  to  adjust  fpr  typet  of  ctothi.TQ 
and  todhridualcManmoat  (on  a  scale  from  1  •  good  to9- bad).  Ootii  means  and  standard  deviations  are 
adjusted  In  a  muttptcaiv*  manner  in  the  equations 

The  model  hat  reoetved  teverai  vaMdatton  studtot  at  Bel  Heloopter  comparing  the  simulation  results 
wtth  results  from  operator-in-the-loop  studies  using  both  simulation  and  actual  fight  of  a  single  pilot 
helicopter.  in  theta  studies,  error  rates  predated  by  SWAS  differed  from  operator  times  by  1%  to  t'% 
(underestimate). 

HummOpwwKr$mmcr(HOS) 

The  Human  Operator  Simulator  fHOS)  fe  a  simulation  model  using  an  approach  different  from  the 
Sieftel-Wotf  models  (Wherry,  1969;  Lane,  Strfeb,  Glenn,  ft  Wherry,  1981;  Hams,  Glenn,  lavecchia,  & 
Zaklad,  1986).  The  original  HOS  approach  was  based  on  four  assumptions: 

•  Human  behavior  is  predctable  and  goal  oriented,  especially  for  trained  operators. 

•  Human  behavior  can  be  defined  as  a  sequence  of  discrete  micro-events,  which  can 
be  aggregated  to  explain  task  performance. 

•  Humans  can  time-share  (switch)  among  several  concurrently  executing  tasks. 

•  Fully  trained  operators  rarely  make  errors  or  forget  procedures. 

The  implication  of  these  assumptions  is  that  the  model  Is  deterministic,  that  is,  the  outcomes  of  operator 
actions  are  derived  from  functional  relationships  formed  as  aquations  rather  than  by  sampling  from  a 
probability  distribution. 

• 

The  latest  version,  HOS-IV,  is  a  general  purpose  simulation  facility  that  provides  the  capability  to  predict 
system  performance  by  dynamic,  interactive  simulation  of  the  human  operator,  the  hardware/  software 
system,  and  the  environment.  HOS-IV  is  implemented  on  a  microcomputer  (IBM  PC- AT)  (Harris,  lavecchia, 
Ross,  ft  Shatter,  1987).  HOS-IV  contains  an  enhanced  user  Interface  to  assist  in  defining,  executing,  and 
analyzing  the  simulation.  The  HOS-IV  user  an  build  independent  models  of  the  environment,  hardware, 
and  operator  to  the  desired  level  of  detail  using  a  top-down  approach.  Operator  task  times  can  be  crudely 
estimated  and  entered  into  the  simulation  or  tasks  can  be  decomposed  in  order  to  utiize  the  set  of  basic 
human  performance  micromodels  resident  in  HOS.  For  example,  a  target  recognition  task  could  be 
modeled  coarsely  by  merely  specifying  a  time  estimate  for  the  overel  recognition  process.  Alternatively, 
the  recognition  task  could  be  decomposed  into  micro- events  such  as  an  eye  movement  followed  by  a 
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visual  perception  followed  by  a  decision.  In  the  letter  cess,  HOS-IV  would  determine  the  time  required  to 
complete  the  task. 

HOS-IV  contains  a  library  of  human  performance  micromodels  that  can  be  used  Co  simulate  the  timing 
and  accuracy  of  particular  human  behaviors,  lit#  core  sat  of  micromodels  arc  as  based  on  expeMnr*mtai 
literature  and  can  be  aocesscd  by  the  user.  The  micromodels  include  eye  movement,  visual  percept  ion, 
decision  time,  short-term  memory,  Istening  and  speaking,  fine-grained  oontro?  manipulation,  hand 
movement,  and  waiting.  These  micromodels  can  be  easily  moefified  or  replaced  entirely. 

Models  of  environment,  system,  and  operator  are  defined  with  the  Hollowing  simulation  building  blocks: 

«  An  object  database  containing  names  and  characteristics  of  the  entitles  to  be 
simulated  (for  example,  Emitters,  Radar,  Display*,  and  Controls). 

•  A  set  of  roles  which  start  an  action  when  concftions  are  appropriate. 

«  A  set  of  sequential  actions  required  to  accomplish  a  process.  The  process  can  be 
defined  for  the  environment,  system,  or  operator.  Operator  processes  can  utiize 
human  micro  models  provided  by  HOS-iV. 

•  An  optional  set  of  events  which  define  external  occurrences  that  afreet  the  simulation 
flow  at  predetermined  times. 

The  result  of  the  simulation  is  a  detailed  timeline  of  operator,  h*;J"?,re,  and  environmental  events  and 
actions  which  can  be  summarized  and  analyzed  for  a  broad  variety  0/  purposes;.  Standard  output  analyses 
are  available  which  provide  statistics  associated  with  performing  tasks,  subtasks,  aid  basic  behaviors.  This 
includes  the  number  of  times  a  micro  model  Is  executed,  the  mean  and  standard  deviation  of  th?  time  to 
complete  a  process,  and  tire  percent  of  simulation  time  spent  on  each  process.  Additionally,  the  user  can 
define  and  access  information  ori  system  measures  of  effectiveness. 

Lane  et  al.  (1981)  identified  a  number  of  appRcatiorts  and  validation  efforts  over  a  wide  range  of 
systems.  Generally,  the  results  have  been  very  favorable.  HOS  allows  a  very  detailed  model  to  be 
developed,  providing  a  greater  degree  of  efiagnostidty  than  other  simulation  models.  HOS  ;s  probably 
more  applicable  as  a  follow-on  analysis  after  less  detailed  analytical  techniques  have  boen  used  to  refine 
the  system  design. 

Model  Human  Proceteor  (MHP) 

Card,  Moran  and  Newell  (1983. 1966)  have  developed  a  potentially  powerful  collection  of  micromodels 
collectively  called  the  Model  Human  Processor  (MHP).  Via  the  MHP,  they  have  establshed  a  tramewoik 
tor  presenting  data  contained  in  the  human  performance  literature  in  a  manner  which  will  make  it  more 
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accessible  to  those  Involved  In  the  engineering  design  process.  They  partition  human  behavior  models 
according  to  their  application  to  the  perceptual,  cognitive,  or  motor  systems,  and  focus  on  simpler,  more 
widely  applicable  models  that  capture  the  predominant  characteristics  of  a  problem.  Models  such  as  these 
can  be  used  to  define  limits  of  operator- system  effectiveness  to  any  scope  tequired.  The  MHP 
mtoromodels  are  currently  only  described  in  the  literature.  Seme  of  the  MHP  models,  however,  have  been 
directly  Incorporated  into  the  HOS  Itorary  and  are  accessible  to  simulation  modelers.  Further  work  in  the 
development  and  application  of  human  performance  models  Is  required.  MHP  has  proven  a  fruitful  mcdel 
tor  analysis  of  computer  Interfaced,  not  covered  by  other  models  (Card,  Moran  &  Newell,  1983). 

9mvrmry 

In  recent  years,  a  number  of  new  simulation  tools  have  been  developed.  Simulations  offer  a  unique 
opportunity  to  evaluate  both  time  and  accuracy  of  performance.  There  is  a  cost,  however,  tor  gaining  the 
accuracy  evaluation  and  that  is  the  additional  time  required  for  developing  the  simulation.  However,  this 
may  be  a  small  price  to  pay  in  the  context  of  overall  system  development  costs. 

For  the  most  part,  more  user  friendly  versions  of  simulation  models  have  been  developed  In  the  last 
several  years.  As  additional  modules  and  computer  tools  are  developed  and  more  complete  databases 
are  built,  simulation  techniques  will  move  to  the  forefront  of  analytical  workload  techniques 

Overall  Summary  and  Concluding  Comments 


Analytic  techniques  can  be  used  to  make  predictive  workload  assessments  early  in  system 
development.  An  important  characteristic  of  these  techniques  is  that  they  may  be  used  before  there  is  an 
"operator-in-the-loop."  Therefore,  workload  predictions  may  be  available  to  have  impact  on  early  system 
design. 

Analytic  techniques  can  be  divided  into  five  major  categories: 

•  Comparison, 

•  Expert  Opinion, 

•  Mathematical  Models, 

•  Task  Analysis  Methods,  and 

•  Simulation  Models. 
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This  analytic  technique  taxonomy  provides  a  useful  structure  In  which  to  classify  workload  assessment 
tools  that  can  be  used  while  system  concepts  end  alternatives  are  being  explored.  The  categories  of 
techniques  described  require  different  Information  and  specific  techniques  may  be  more  appropriate  for 
answering  different  kinds  of  questions. 

Some  of  the  analytic  techniques  have  not  yet  been  systematically  formalized  or  fully  validated  (e.g., 
comparison).  Further  work  should  be  done  to  develop  these  techniques  for  workload  assessment  that 
can  be  used  very  early  in  conceptual  development  and  system  design. 
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CHAPTER  4.  EMPIRICAL  TECHNIQUES -PRIMARY  TASK  MEASURES 


Ifcta  ifat  1 1  VAatlkMlflUftAA 

uvwvhw  Or  tmprncai  i  •crmiCfiJMi 

With  this  chapter,  wo  begin  the  review  of  empirical  techniques  used  to  measure  operator  workload.  As 
discussed  in  Chapter  2  and  illustrated  in  Table  2-1,  we  divided  empirical  techniques  into  four  major 
categories: 

•  Primary  Tasks, 

•  Subjective  Met  tods, 

•  Secondary  Tasks,  and 

•  Physiological  Techniques. 

Each  of  these  major  classes  of  measures  has  been  researched  extensively  and  we  have  provided  an 
overview  of  a  number  of  stufes  in  each  category.  Further,  each  of  the  categories  has  cfisfinctive  features, 
especially  in  the  context  of  workload  definitions,  and  these  features  are  manifested  in  the  literature.  We 
have  sought  to  capture  these  distinctions  and  Offering  flavors  in  our  reviews,  and  accordingly  the  review 
for  each  category  differs  both  in  the  approach  to  the  literature  and  organization. 

in  our  discussion  of  OWL  assessment  techniques,  the  overall  Intention  is  both  an  analysis,  especially  in 
the  context  of  sensitivity  and  diagnosticity,  and  an  integration  ot  the  literature.  The  objective  ot  this 
integration  is  to  provide  practical  guidance  for  designers,  developers,  and  evaluators  of  systems.  It  is 
recognized  that  the  individual  who  should  be  concerned  with  human  workload  issues  cannot  wade 
through  hundreds  of  studies  to  obtain  OWL  assessment  guidance.  Resources  are  very  Bmited  and 
should  be  expended  largely  performing  the  OWL  assessment,  not  learning  about  workload  research. 
Thus,  each  class  ot  OWL  techniques  Is  reviewed  with  summaries  and  recommendations  provided. 

A  Summary  Evaluation  of  Empirical  Techniques 

Because  of  the  amount  and  variety  of  material  to  follow,  a  summary  evaluation  of  selected  techniques  is 
shown  in  Table  4-1.  The  entries  in  the  table  represent  the  authors'  considered  judgments  on  the 
sensitivity,  cost  and  effort,  and  diagnosticity  for  a  number  of  the  techniques  to  be  discussed.  The 
techniques  shown  in  Table  4-1  were  judged  on  a  basis  relative  to  all  the  other  measurement  techniques, 
net  just  within  their  own  category.  Also,  the  techniques  were  rated  independently  for  each  of  the  three 


criteria.  (Tha  authors  o 1  this  volume  have  oottecttvaty  worked  with  vtrtui#/  every  technique  In  the  table.) 
Please  note,  it  may  be  better  to  use  a  technique  rated  *Lov/  than  nc  technique  at  ell.  Although  relative 
judgments  have  been  attached  to  these  techniques,  aU  techniques  can  m  be  used  to  obtain  information 
regarding  OWL.  in  addition,  as  a  point  made  throughout  this  report,  multiple  measures  ol  workload  should 
be  used  to  obtain  more  complete  Information  regarding  potential  and  existing  OWL  problems. 


Table  4-1 .  Summary  of  empirical  techniques  judged  tor  sensitivity,  cost,  inJ  dlagnost kilty. 


Technique 

SenaUvtty 

Coat/Effort 

Requirements 

DtagnoeUctty 

Primary  Teak  Matei  sauiariti 

System  Response 

Low/High1 

n 

Low  Cost 

Moderate  Effort 

Low 

Operator  Response 

High1 

Low/Moderate 
Moderate  Effort 

Moderate/High 

Subjective  Methods 

Analytic  Hierarchy  Process 

High2 

Low  Cost 

Low  Effort 

Moderate2 

Bedford 

High2 

Low  Cost 

Low  Effort 

Low 

Cooper-Harper 

High  tor 
psychomotcr 

Low  Cost 

Low  Effort 

Low 

Modified  Cooper-Harper 

High 

Low  Cost 

Low  Effort 

Low 

NASA-TLX 

High 

Low  Cost 

Low  Effort 

Moderate/High 

SWAT 

High 

Low  Cost 

I.OW  Effort 

Modarate/High 

Psychometric  Techniques 

High 

Low  Cost 

Low  Effort 

Low 

Interviews 

Varies 

Low  Cost 

Low  Effort 

Moderate/High 

Questionnaires 

Varies 

Low  Cost 

Low  Effort 

Modarate/High 

1  Varies  with  workload 

2  Represents  some  uncertainty  about  sensitivity  and  diagnosticity  due  to  limited  research. 
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Table  4-t .  Summary  of  empirical  techniques  Judged  for  sensitivity,  cost,  and  dlagnostlclty  (Com.). 


Technique 

8traMvtty 

boaMEffort 

Requirements 

DtognoUcfty 

Secondary  Teaks 

Embedded  Secondary  Task 

High 

Low  Coat 

Low  Effort 

Moderate/High 

Choice  Reaction  Time 

Moderate 

Moderate  Cost 

Low  Effort 

Moderate 

Sternberg  Memory  Teak 

Moderate 

Moderate  Coat 

Low  Effort 

Moderate 

Time  Estimation  Task 

Moderate 

Moderate  Cost 

Low  Effort 

Moderate 

Phytfologlc*]  T«chnlqu«s 

Blink  Rate 

Low 

Moderate  Coat 
Moderate  Effort 

Low 

Body  Fluid  Analysis 

Low 

Low  Cost 

Low  Effort 

Low 

Evoked  Potential* 

Moderate 

High  Coat 

High  Effort 

High3 

Eye  Movements  &  Scanning 

High 

High  Cost 

High  Effort 

High3 

Heart  Rate 

Moderate 

Moderate  Cost 
Moderate  Effort 

Moderate 

Heart  Rate  Variability 

Moderate 

Moderate  Cost 
Moderate  Effort 

Moderate 

Pupil  Measures 

Moderate 

High  Cost 

Moderate  Effort 

Moderate3  "N. 

|  3  The  rating  applies  within  a  narrow,  specialized  range. 

1 _ 

The  sensitivity  rating  reflects  the  relative  ability  of  the  measure  to  discriminate  among  different  levels  of 
workload.  The  cost  and  effort  requirements  reflect  a  judgment  of  the  overall  resource  requirements 
including  personnel,  time,  effort,  and  equipment.  The  diagnostic^  reflects  the  usefulness  of  the 
measure  in  pinpointing  the  processes  involved  in  high  workload. 

As  will  be  seen  below,  primary  task  measurement  has  some  interesting  properties  that  cause  sensitivity 
to  vary  with  workload.  Although  relative  judgments  have  been  made  regarding  secondary  tasks,  there  is 
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uncertainty  as  to  their  sensitivity  and  tfiagnosiictty  outside  ot  the  aviation  environments.  Tor  those  entries 
with  more  than  one  rating,  the  lodgments  are  Intended  to  reflect  the  range  of  sensitivity  or  cSagnostldv. 
There  are  several  areas  where  insufficient  Information  exists  to  make  a  judgment.  although  preliminary 
findings  suggest  the  degree  of  sensitivity  or  dagnosticity;  these  uncertainties  ne  marked.  A  few  entries 
reflect  the  variable  nature  of  the  measurement  technique  depending  on  specific  situations;  these  aie  also 
marked. 


Video  recording  of  operator  performance  is  a  useful  tool  in  OWL  assessments,  but  can  not  be  easily 
placed  in  a  table  such  as  the  one  presented,  ft  can  be  used  as  an  Important,  practical  empirical  method 
and  should  not  be  overlooked  in  developing  empirical  measurement  procedures. 

For  primary  task  techniques,  there  are  a  very  large  number  of  specific  measures  that  have  been  used  - 
nearly  every  situation  requires  its  own  measures.  Because  of  this  diversity,  theoretical  and  conceptual 
analysis  is  very  important.  First,  we  have  classified  primary  task  measures  Into  system  performance  and 
operator  performance  measures.  Then,  the  development  and  selection  of  unique  primary  measurements 
is  considered.  Primary  task  measurement  is  covered  in  this  chapter. 

Subjective  methods  research  is  very  different.  The  emphasis  is  on  assessing  the  operator's 
experiences  and  the  amount  of  subjective  effort  expended.  Most  OWL  research  is  concerned  with 
subjective  rating  scales,  but  there  are  a  relatively  small  number  of  these  scales  in  wide  use.  Accordingly, 
our  review  focuses  on  these  scales  in  detail  and  analyzes  the  comparative  features  of  the  rating  scales. 
Subjective  techniques  are  covered  in  Chapter  5. 

For  secondary  task  techniques,  the  situation  is  somewhat  similar  to  primary  task  techniques  in  that  a 
great  many  individual  measures  have  been  used;  however,  a  substantial  part  of  the  research  has  utilized  a 
limited  number  of  techniques.  In  our  discussion,  we  have  examined  some  underlying  theoretical  issues  to 
the  use  of  secondary  tasks.  These  issues  revolve  around  attempts  to  assess  residual  capacity  or  to  fill  and 
load  that  residual  capacity.  Our  review  and  analysis  takes  the  point  of  view  that  real  systems  are  multitask 
environments  and  that  the  secondary  task  paradigm  is  most  effective  in  that  context.  Chapter  6  covers  the 
secondary  techniques. 

Finally,  physiological  techniques  represent  a  different  class  of  techniques.  In  particular,  these 
techniques  generally  require  spedaBzed  expertise,  extensive  equipment,  and  procedures  sometimes 
difficult  to  perlorm  outside  the  laboratory.  We  have  provided  some  background  and  rationale  for  the  use 
of  these  techniques,  techniques  that  probably  assess  activation  or  arousal.  Chapter  7  covers  the 
physiologies!  class  of  techniques. 
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Primary  Taci  MaauMMnt 


Tha  goal  of  system  development  is  to  produca  a  ay&tam  which  ralabiy  achieves  Its  mission.  The 
operator  is  an  important  part  of  the  system  System  performance  is  a  combination  of  operator  performance 
and  the  hardware  system  and  is  reflected  In  meeting  the  mission  goals,  it  Is  the  operator's  task  by  means 
of  decision  making,  integration  of  information,  manipulation  of  controls,  etc.  to  guide  the  hardware  toward 
successful  completion  of  the  mission.  The  system  responds  to  the  operator's  commands.  Thus,  it  Is 
reasonable  to  talk  about  two  kinds  of  performance:  the  operator  and  the  system.  A  statement  about 
operator  performance  is  meaningless  unless  system  performance  is  also  acceptable.  Accordingly,  there  is 
a  need  to  measure  both. 

OWL,  as  was  discussed  In  Chapter  2,  Is  not  the  same  as  performance  of  the  operator  or  the  system. 
OWL  was  defined  as  the  relative  capacity  to  respond.  OWL  arises  as  the  Interaction  between  the  operator 
and  other  system  components  during  mission  execution.  Workload  evaluation  assesses  this  interaction, 
i.e.,  the  contribution  of  the  operator  to  the  system  and  the  impact  of  the  hardware  and  other  situational 
components  on  the  operator.  Stated  differently,  workload  evaluation  assesses  the  location  of  the 
operator  within  the  workload  envelope.  One  approach  to  assessing  OWL  Is  by  means  of  primary  task 
measures. 

Primary  Task  Definition 


Even  though  it  may  seem  surprising,  it  is  not  always  clear  what  Is  meant  by  a  primary  task.  In  flying  or 
driving,  the  primary  task  for  the  operator  is  to  keep  the  tubber  side  down,  that  is.  operate  the  vehicle  in  a 
manner  that  will  maintain  proper  vehicle  orientation.  But  the  operator  may  have  other  important  functions 
within  a  mission.  Communications  is  often  considered  a  secondary  or  subsidiary  task.  However,  if  an 
aircraft  is  performing  a  scout  function,  accurate  and  timely  communication  would  be  of  utmost  importance 
to  the  mission.  Similarly  what  is  a  copilot's  primary  task?  In  some  helicopters,  the  Job  is  designated  as 
Copilot/Gunner.  At  least  by  the  designation,  the  Copilot/Gunner  has  two  primary  tasks,  and  he  may  also 
have  to  handle  communicatiors.  Thus,  here  and  in  other  easily  developed  examples,  the  operator  may 
have  several  primary  tasks. 

During  the  course  of  a  mission  the  emphasis  on  various  operator  tasks  will  change,  that  is,  the  priorities 
associated  with  one  primary  task  wil  change  in  relation  to  another.  What  is  labeled  as  the  primary  task  may 
change  depending  on  the  specific  situation  and  where  the  operator  is  in  the  overall  mission.  For  example, 
most  investigators  analyzing  a  mission  have  seen  the  need  to  divide  the  mission  into  segments  to  capture 
the  flavor  of  the  different  task  emphases  and  priorities.  Further,  the  label,  primary  task,  is  sometimes 
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(limply  »  definition  assigned  by  interested  analysts,  such  as  a  workload  evaluator.  (This  definitional  issue 
will  arise  again  in  Chapter  6  when  consideration  of  secondary  tasks  Is  discussed.)  In  some  cases,  the 
definition  is  clear,  in  othor  cases  however,  it  may  seem  somewhat  arbitrary.  For  these  reasons,  it  16  better 
to  think  of  multlp'a  tasks  rather  than  a  single  primary  task.  In  short,  one  needs  to  evaluate  several  tasks 
coupled  with  the  priorities  associated  with  those  tasks  and  not  just  a  single  primary  task.  We  are  going  to 
discuss  primary  tasks  in  a  general  way  and  as  though  the  definition  were  always  dear. 

Primary  task  measures  am  important  as  part  of  a  battery  of  workload  measures.  However,  it  should  be 
pointed  out  that  this  position  Is  not  universal.  Some  authors  (e.g.,  Hart,  1986a;  O'Donnell  &  Eggemeier, 
1986)  state  that  primary  task  measurement  may  not  be  useful  In  workload  assessment.  Certainly,  there  are 
many  examples  where  this  is  true.  However,  the  vary  fact  that  the  results  appear  to  be  contradictory 
suggests  that  further  analysis  and  clarification  is  In  order.  The  following  discussion  provides  the  necessary 
clarification. 

Primary  Task  Measurement  Types 


Primary  task  techniques  may  be  categorized  into  two  broad  types.  Type  1  includes  those  measures 
which  are  of  the  system  and  contain  a  contribution  in  some  form  (sometimes  unknown)  of  operator 
performance.  For  an  instrument  landing  task,  glide  slope  and  localizer  errors,  often  measured  using  root 
mean  square  (RMS)  or  standard  deviation  of  relative  position,  are  of  this  type  (Wleiwilie,  et  al.,  1985).  Type 
2  measures,  by  contrast,  are  a  more  dired  index  of  operator  performance,  often  finer-grain,  fine  structure 
measures  that  retted  strategies  adopted  by  the  operator  to  cope  with  task  demands.  For  the  landing  task 
example,  this  could  be  the  number  of  control  movements  per  unit  time  (as  measured  at  the  stick  or 
column,  not  at  the  aircraft  control  surfaces). 

To  understand  the  importance  and  implications  of  this  classification,  let  us  consider  again  the 
relationship  between  performance  and  workload  shewn  earlier  in  Figure  2-1.  Figure  4-1  is  a  replot  of 
Figure  2-1 . 


•  Region  1  -  the  operator's  load  is  too  low.  (This  region  is  not  discussed  !n  this  report). 

•  Region  2  -  the  operator's  load  is  not  excessive,  because  cocBtional  resources  can 
and  may  be  mustered  tc  "'"Main  performance,  and  the  performance  level  is  held 
relatively  constant  and  high. 

•  Region  3  -  the  operator's  load  has  become  excessive.  In  this  region,  the  load 
increases  well  beyond  the  operator's  capability  for  compensation,  and  performance 
levels  become  asymptotically  low. 
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Figure  4.1 .  Hypothetical  effect  of  workload  on  sensitivity  of  Type  1  and  Type  2  measures. 

These  response  regions  provide  a  framework  for  understanding  where  Type  1  and  Type  2  primary  task 
measures  are  sensitive  to  workload.  Sensitivity  is  a  critical  issue  with  primary  task  measurements.  Type  1 
will  be  sensitive  in  Region  3,  whereas,  Type  2  will  be  more  sensitive  in  Region  2  as  well  as  covering 
Region  3. 

Type  t  Measures:  The  System+Operator.  Type  1  measures  of  primary  task  performance  are  indices  of 
system+operator  performance.  Typically,  they  include  measures  of  human  tracking  errors  or  other 
measures  of  system  performance  (e.g.,  Wierwille  et  al.,  1985).  However,  measures  of  system 
performance  such  as  engine  thrust,  RPM,  movement  of  control  surfaces  could  be  classified  as  a  Type  1 
measurement,  since  changes  in  thrust,  for  example,  reflect  operator  activities  plus  system  lags.  Similarly, 
any  measure  of  effectiveness  (MOF)  for  mission  performance  would  ordinarily  qualify  as  a  Type  1 
measure.  Type  1  measures  were  an  initial  focus  of  workload  research,  no  doubt  because  of  their 
association  with  the  quality  of  system  performance  (Sanders,  1979;  Williges  &  Wierwille,  1979).  This 
category  of  measures  provides  an  index  of  system  performance  (MOEs)  and  Is  useful  in  this  regard. 

Type  2  Measures:  The  Operator.  Type  2  measures  of  primacy  task  performance  are  defined  here 
as  those  which  assess  the  nature  of  operator  performance  directly  (Hart,  1986a).  The  measurement  can 
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take  several  different  forms:  a  measure  may  be  directed  at  quality,  frequency,  or  quality  criteria  of 
operator  performance.  Type  2  measures  may  also  be  directed  toward  detecting  the  fine  structure  of 
operator  performance,  i.e ,  those  that  link  operator  activity  to  measurable  performance  (Hart,  1988a).  in 
general,  the  category  includes  such  measures  as:  (a)  control  movements  per  second  in  a  psychomotor 
task,  (b)  response  times  in  a  perceptual  or  cognitive  task,  (c)  errors  of  omission,  (d)  errors  of  commission, 
or  (e)  communications  response  times  in  a  communications  task  (Wierwi'le  et  al.„  1985).  The  very  reason 
Type  1  measures  are  insensitive  is  also  the  reason  Type  2  measures  are  sensitive:  As  the  operator  copes 
with  workload  and  under  increasing  load  marshalls  greater  resources  to  hold  Type  1  performance 
constant,  the  operator  may  perform  differently  and  patterns  of  performance  may  change  and  fine  structure 
tends  to  shift.  Type  2  measures  are  variable  because  these  shifts  may  provide  evidence  of  a  change  in 
OWL  and  hence  provide  a  means  to  assess  OWL  levels. 

Comparison  of  Typo  1  anti  Typo  z  Several  studies  have  used  Type  i  measurements  in  parallel 
with  Type  2  measures.  For  example,  O'Donnell  and  Eggemeler  (1986)  discuss  a  study  by  Schultz,  Newell 
and  Whitbeck  (1970)  which  showed  increases  in  turbulence  had  no  effect  on  glide  slope  error  (Type  1). 
Similarly,  Wierwiile  et  al.  (1985)  did  not  find  significant  effects  of  task  loading  on  giide  slope  error. 
However,  if  one  examines  the  frequency  of  control  Inputs  for  wheel,  column,  or  throttle  (Type  2).  there  is  a 
clear  effect  of  turbulence  on  frequency  of  control  movements  in  the  Wierwiile  et  al.  study  and  in  two  other 
separate  studies  by  Dick  (Dick,  1980;  Dick  et  al.,  1976).  (In  the  Dick  studies,  pilot  ratings  of  handling 
quality  ranged  from  3  in  a  no  turbulence  condition  to  7  in  a  high  tuibulence  condition  and  there  was  no 
effect  of  turbulence  on  glide  slope  error.) 

Sanders,  Burden,  Simmons.  Lees,  and  Kimball  (1978)  tested  nine  helicopter  pilots  on  each  of  three 
levels  of  stabilization  augmentation  (for  yaw,  pitch  and  roll)  during  hover.  Altitude  control  was  under 
manual  control  for  all  three  conditions.  Thus,  the  stabilization  device  should  facilitate  altitude  control  since 
less  effort  would  be  expended  on  the  other  dimensions.  Type  1  measures  did  not  show  any  effect  for  the 
three  levels  of  stabilization  augmentation  or  for  altitude  control.  In  short,  system  performance  did  not  differ 
lor  the  three  conditions.  By  contrast.  Type  2  measures  for  Fore-Aft  control  and  pedal  movements  for 
altitude  control  showed  significant  variations  with  both  fewer  movements  ana  smaller  magnitude  of 
movement  with  the  stabilization  augmentation  devices  operational.  Other  Type  2  measures  did  not  show 
an  effect.  (Of  interest,  average  pilot  ratings  only  ranged  from  3.1  to  4.3  tor  the  three  conditions,  showing  a 
relatively  small  subjective  spread  among  the  conditions.)  In  accordance  with  the  Type  1  -  Type  2 
distinction,  Type  1  measures  were  insensitive  while  Type  2  measures  were  sensitive  to  variations  in 
variables  that  affect  workload. 

Summary.  Although  we  have  cited  only  a  few  studies,  the  general  statement  can  be  made  that 
Type  1  measures  of  system+operetor  are  not  often  sensitive  to  workload  manipulations,  however,  they 
are  important  in  system  evaluation  considerations.  Type  2  measures  of  '(he  operator  directly  generally 
show  effects  on  relevant  dimensions;  relevant  dimensbns  being  these  measures  one  would  reasonably 
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txpect  to  stow  a  difference  as  in  the  Sanders  at  al.  (19/8)  study.  Type  2  measures  are  essential  tor 
workload  evaluation. 

enhancing  Type  2  Measures:  The  Fins  £:nvtur»  of  Behavior 

Some  investigates  have  questioned  both  the  sensitivity  and  the  diagnostic  capability  ot  primary 
measures.  However,  as  shown  above,  when  one  makes  the  distinction  between  Type  1  and  Type  2 
measures,  it  is  dear  that  Type  2  primary  measures  are  sensitive.  Furthermore,  it  is  possible  to  enhance 
sensitivity  and  often  diagnostic^  by  examining  the  fine  structure  of  behavior.  This  enhancement  can  be 
an  especially  valuable  approach,  because  time  and  money  are  almost  always  Hmited.  Some  ideas  are 
developed  below  which  provide  background  for  practical  appfication  of  such  measures. 

Many  of  the  primary  task  measure  shown  to  be  especially  sensitive  to  workload  variations  are  indicators 
of  strategy  shifts  (Hart, 1986a).  While  some  investigators  have  avoided  the  strategy  interpretation,  the 
results  reported  seem  to  be  consistent  with  the  ideas  being  developed  here.  (See  O'Donnell  and 
Eggemeier  [1986]  for  additional  studies  in  this  category.)  Strategy  is  widely  used  in  describing  behavior 
and  the  term  without  restriction  encompasses  too  many  types  ui  action  descriptions  including  style,  S-R 
mapping  process,  etc.  Accordingly,  we  will  use  rule  as  a  more  neutral,  easily  defined,  and  precise  term. 
This  usage  here  has  parallels  with  the  idea  of  rules  in  production  models  of  behavior  (Card.  Moran.  & 
Newell, 1983).  Rule  driven  performance  changes  also  proved  to  be  sensitive  to  manipulations  of  load  for 
many  of  the  technique^  «nd  measures  investigated  by  Wierwiile  et  al.  (1985). 

A  brief  digression.  In  order  to  draw  out  the  value  of  Type  2  measures  fully,  it  is  appropriate  to 
consider  what  is  meant  by  rule  driven  performance.  In  some  sense,  one  could  argue  that  all  behavior  is 
rule  driven.  There  are  global  rules  which  might  involve  sur  rival,  for  example,  and  there  are  more  detailed 
rules.  What  we  are  interested  in  primarily  is  the  fine  structure  of  rule  driven  behavior.  Identification  of  rules 
is  done  by  inference  from  detailed  examination  of  the  performance  measures.  Accordingly,  measurement 
should  be  done  with  care  so  as  to  permit  the  correct  inferences  to  be  made. 

A  hypothetical  simple  visual  discrimination  task  will  illustrate  a  detailed  example  of  rule  driven  behavior. 
In  this  experiment,  the  stimuli  are  purposely  picked  and  will  be  either  an  H  or  an  i",  since  they  differ  only  in 
the  crocs  bar.  They  will  be  presented  at  the  same  point  in  space  (and  all  other  conditions  are  controlled). 
When  the  H  is  presented,  the  operator  is  to  push  the  left  button  to  indicate  response;  when  the  N  is 
presented  the  operator  will  press  rhe  right  button.  The  experiment  is  performed,  but  the  operator  is  not 
informed  about  the  fact  that  his  reaction  time  is  also  being  recorded  in  addition  to  accuracy.  After 
completing  the  data  collection  tor  this  task,  the  operator  is  asked  to  do  the  experiment  again,  or  this 
second  case,  however,  the  operator  is  told  about  recording  reaction  time  AND  the  operator  is  told  to 
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respond  as  fast  cs  possible.  To  continue  our  hypothetical  example.  after  having  finished  the  second  task, 
the  operator  is  asked  to  do  It  yet  a  third  time.  The  operator  is  told  that  tire  performance  on  the  second 
case  contained  too  many  errors  and  consequently  the  operator  should  be  more  accurate.  S*it  reaction 
time  wil  again  be  measured  in  this  third  case. 

What  would  the  collective  results  of  such  an  experiment  show?  First,  the  average  response  time  would 
be  dihare nt  for  each  of  the  three  experiments.  The  second  case  would  be  the  fastest,  the  third  case  the 
next  fastest,  and  the  first  case  the  slowest.  Second,  the  accuracy  would  also  drier  with  Case  1  best  and 
Case  2  worst.  Specifically,  the  pattern  ot  performance  la  dtrierent  for  the  three  cases.  Why?  Basically, 
because  the  operator  was  working  under  three  driterent  sets  of  instructions  or  three  sets  of  rules.  These 
rules  might  be.  in  order  of  execution  the  cases  above: 

Case  1 .  Do  the  visual  d serf  mi  nation  as  accurately  as  possible:  Time  is  not  a  factor. 

Ct.se  2.  Do  the  discrimination  as  last  as  possible:  Accuracy  is  iess  important  than  time. 

Case  3.  Do  the  discrimination  as  fasi  and  accurately  as  possible:  Accuracy  and  time  are 
of  equal  importance. 

The  time-accuracy  tradeoff  is  a  well  known  phenomenon  in  the  reaction  time  Iterature  (e  g..  Posner, 
1978;  1986).  However,  there  is  a  catch  -  not  all  people  use  all  of  the  rules  or  in  the  manner  logic  would 
dictate.  Unless  the  instructions  are  explicit,  any  of  the  three  rules  may  be  used  depending  on  the 
individual.  Only  when  the  conditions  are  changed  by  instruction  or  by  a  situational  demand,  is  it  possible 
to  determine  which  of  the  rules  were  used,  that  is,  several  levels  of  a  workload  variable  need  to  bo 
included.  A  further  requisite  to  discovering  the  rules  is  the  measurement  of  two  components  of  behavior, 
time  and  accuracy.  Without  both  measures,  the  discovery  of  this  underlying  rule  structure  would  be 
difficult.  Additionally,  had  we  measured  it,  we  might  have  found  that  the  force  appSed  to  the  response 
buttons  differed  for  the  three  cases  as  well.  This,  and  other  measures  os  behavior,  could  provide 
additional  information  jiocjt  the  fine  structure  of  operator  behavior. 

Application,  o  f  Pina  Structure  Measures.  The  process  of  identifying  fine-struchjre  and  rule-related 
measures  tor  Army  systems  may  be  illustrated  with  a  hypothetical  example  (patterned  after  the 
communication  task  experiments  of  Wierwille  et  at.  (1985]).  This  communication  bisk  approach  has  been 
shown  to  be  sensitive  to  workload  manipulations  in  another  context  (Green  &  Flux,  1977).  These  rules  will 
be  executed  according  to  perceived  time  demands.  That  is,  an  assumption  is  made  that  the  operator  will 
not  perform  at  a  rule  higher  than  the  situation  demands.  These  hypothetical  rulec  might  be: 

Rule  1.  f  TIME  IS  AVAILABLE  then  DO  NORMAL  communications  pace. 

Rule  2.  I  TIME  IS  SHORT  then  SPEED  UP  speech  rate 
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Rule  ?.  1  TIME  IS  CRITICAL  then 

Rule  4.  I  TIME  IS  VERY  CRITICAL  then 


SHORTEN  messages. 

DELAY  or  ELIMINATE  non-essential 
communications. 


Application  of  these  titles  by  the  operator  would  have  Important  Implications  on  performance  and 
simultaneously  reflect  changes  in  workload.  The  types  of  changes  in  performance  one  would  expect  for 
each  of  the  rules  are  as  follows: 


Rule  1  performance.  Normally  paced  performance.  This  forms  a  baseline  against 
which  to  compare  other  performance  characteristics  under  other 
rules. 


Rule  2  performance.  Quicker  response  then  Rule  1  and/or  message  compacted  Into  a 
shorter  time.  A  few  errors  of  omission. 

Rule  3  performance.  Fewer  words  in  message  than  Rule  2,  rate  of  speaking  similar  to 
performance  with  Rule  2.  Possible  errors  of  omission. 

Rule  4  performance.  Few  words  in  message  and  spoken  fast  as  compared  with  Rule  3, 
less  essential  messages  omitted  or  delayed.  Both  errors  of 
commission  and  omission. 


Some  Previous  Stud  las.  Many  tasks  can  be  dissected  in  this  way  and  anticipated  operator 
perlorma.ice  rules  established.  The  exact  types  of  performance  rules  and  associated  changes  will  differ 
with  tire  situation.  For  instance,  pilots  win  make  more  control  movements  on  the  stick  (and/or  wheel)  and 
possibly  throttle  under  heavy  turbulence  conditions  than  under  light  turbulence  conditions.  However, 
just  because  one  observes  a  change  in  the  performance  measure,  one  cannot  necessarily  conclude  that 
workload  is  higher.  Use  of  an  autopilot  with  manual  throttle,  is  certainly  a  lower  workload  condition  as 
compared  with  total  manual  control;  nevertheless,  the  number  of  throttle  changes  increases  substantially 
under  the  autopilot  mode  (Dick,  1330).  Indeed,  this  difference  represents  a  rule  driven  performance 
change,  but  not  one  caused  by  increased  workload;  the  overall  pattern  of  performance  measures  is 
needed  to  identity  the  reasons  for  this  finding.  Similarly,  a  reduction  in  performance  may  reflect  fatigue 
more  than  workload  per  se  (Angus  &  Helsgrave,  1983). 


Bainbridge  (1974;  1978)  has  reviewed  and  discussed  performance  rules  and  their  role  in  determining 
performance.  For  example,  air  traffic  controllers  were  asked  to  find  confBcts  between  aircraft.  The 
controllers  used  two  methods:  some  controllers  arranged  fights  under  their  control  geographically  and 
others  by  altitude.  Those  controllers  who  used  the  attitude  approach  were  able  to  perform  better  (faster 
and  more  efficiently!  than  those  who  used  geography  (Leptat  &  Bissert  [1965]  cried  in  Bainbridge,  1974). 
Similarly,  as  the  number  of  aircraft  increased,  there  was  increasing  simplicity  and  decreasing  redundancy  in 
messages  (Sperancbo  [1971]  cited  in  Bainbridge,  1974).  These  and  other  examples  are  Indicative  of 
performance  rule  changes  as  a  function  of  task  demands  and  task  loudinq. 


73 


Summary .  Some  Type  2  measures  ere  conducive  to  identifying  rules  through  the  tine  structur  e  of 
performance  and  others  are  not.  that  is,  the  measures  vary  in  sensitivity  for  rule  detection  and 
identification.  There  is  no  easily  categorized  structure  of  behavior  which  fits  this  fine  structure  analysis 
approach.  Genaralfy,  as  in  identifying  the  speed-accuracy  trade  off,  St  is  necessary  to  employ  several 
different  measures.  Accordingly,  one  should  attempt  to  tread  the  fit  a  Ine  between  missing  an  important 
parameter  and  burying  the  analyst  in  a  flood  of  less  relevant  data  (Han  1986a).  Multiple  measures  provldo 
greater  capabilities  tor  probing  aspects  of  rule  driven  performance  as  well  as  providing  potentially 
enhanced  statistical  sensitivity  via  multivariate  analysis  (O'Donnell  &  Eggemeler,  1986).  Multiple  measure 
(including  fine  structure]  assessment  frequently  will  also  serve  to  overcome  the  criticism  of  primary 
measures  as  insensitive  and  non-diagnostic  (e.g.,  Gopher  &  Donchin,  1986;  O'Donnell  &  Eggemeler, 
1986). 

Davatopmant  of  Primary  TaakMmaurm 


A  major  difficulty  with  primary  measures  is  their  potential  lack  of  tramferabitty  across  applications  (Hicks 
&  Wierwiile,  1979;  Williges  &  Wierwille,  1979).  The  specific  measures  to  be  used  must  typically  be 
developed  for  each  application  and  may  not  be  used  routinely  in  another  application.  The  difficulty  stems 
from  the  simple  tact  that  operators  may  perform  different  tasks  in  different  systems,  and  consequently 
outputs  or  work  products  differ.  Of  course,  measures  may  be  and  should  be  adopted  across  systems  in 
cases  where  tasks  are  essentially  unchanged  (e.g.,  stick-movements  for  aircraft  evaluations,  steering 
wheel  and  accelerator  movements  for  driving,  communications  in  a  variety  of  contexts).  Because  of  the 
potentially  reduced  transferability  of  primary  measures,  general  guidance  for  their  development  is  outlined 
in  the  following  discussion.  Specific  consideration  is  provided  for  selection,  implementation,  and 
preliminary  evaluation  of  reiabiity  of  primary  measures. 

SebcUng  Maaauramanta  on  Primary  Taaka 

Appropriate  primary  task  measures  may  be  devised  for  each  appBcation.  Remembering  that  measures 
of  performance  have  differing  utJSties,  an  investigator  should  identify  measures  that  are  most  appropriate 
for  the  application  at  hand.  Where  appropriate,  Type  1  system  performance  measures  may  be  identified 
by  examining  system  objectives  and  outputs.  These  measures  might  include  number  of  targets  detected, 
number  of  targets  fired  on,  accuracy  and  rate  of  firing,  etc.  The  measures  selected,  of  course,  will  be 
highly  dependent  on  the  system  under  evaluation.  For  the  Type  2  category,  usually  potential  rate  and 
error  measures  for  each  task  can  be  identified  that  provide  the  requisite  direct  mapping  between  operator 
behavior  and  measurable  performance  (Hart,  1986a).  In  general,  latency  and  error  scores  are  excellent 
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candidates  and  have  been  reported  as  sensitive  across  a  half  dozen  studies  by  O'Donnell  and  Eggemeier 
(1986). 

For  identifying  Type  2  rule  related  measures,  asking  an  operator  to  describe  the  rules  is  helpful,  but  not 
always  useful.  There  are  often  differences  between  what  operators  do  and  what  they  think  or  say  they  do 
(Spady,  1978a).  One  approach  is  for  the  investigator  to  do  a  pre'lmina./  evaluation  before  performing  the 
actual  test.  This  can  be  done  by  monitoring  operator  task  performances  under  known  varied  loads.  As 
load  increases,  the  investigator  may  then  discover  the  performance  rules  used  by  the  operator  for 
adaptation  or  coping  with  the  additional  toed.  Candidate  rule-related  measures  may  then  be  chosen  which 
reflect  this  adaptation  process. 

An  alternative  is  to  use  measures  which  have  been  shown  to  be  successful.  Melster  (1985,  pp.  256-263) 
has  considered  issues  of  selecting  task  measures  as  well  as  provided  a  listing  of  possible  measures  for 
system  evaluations  based  on  earlier  work  (Smode,  Gruber,  &  Ely,  1962).  Table  4-2  presents  an  extract  of 
this  listing  that  may  serve  as  as  guide  to  selecting  a  variety  of  primary  task  measures,  in  the  table,  a 
Category  like  TIME  is  a  general  class  of  measurement  with  three  Subcaiegories,  e.g.,  reaction  time,  which 
in  turn  can  be  applied  to  several  events  listed  under  Description,  in  the  communication  example  these 
could  include:  (a)  mean  elapsed  time  between  the  end  of  received  message  and  the  beginning  of  the 
next  transmitted  message,  (b)  mean  length  of  each  transmitted  message,  (c)  variance  in  time  between  end 
of  received  message  and  beginning  of  corresponding  transmitted  message,  or  (d)  proportion  of 
messages  shed  (omitted).  The  process  of  identifying  rule  related  measures  ultimately  involves  the 
mapping  of  expected  rule  usage  to  corresponding  measures  that  reflect  the  use  of  such  rules. 

Errors  are  an  especially  interesting  measure.  Errors  can  take  several  forms:  omission,  commission,  or 
wrong  order  of  execution.  Not  only  might  they  reflect  high  workload,  they  can  be  the  cause  of  increased 
workload  (Hart,  1986a).  When  an  error  is  made,  often  some  corrective  response  has  to  be  made  by  the 
operator.  This  adds  on  to  the  number  of  things  the  operator  has  to  do  and  increases  time  pressure.  If  the 
number  of  errors  is  substantial,  then  elimination  of  the  cause  of  the  errors  will  substantially  reduce 
workload.  Any  technique  that  provides  diagnosticity,  will  be  of  general  help. 


Tabkt  4-2,  Variates  of  primary  task  measure  candidates  (Meister,  1985). 


CATEGORIES 

8U  B  CATEGORIES 

DESCRIPTION 

TIME 

Reaction  time.  1.*.,  time  to 

perceive  event; 

e 

initiate  movement; 

e 

Initiate  correction; 

e 

initiate  activity  following  completion  of 
prior  activity; 

• 

Time  to  complete  en  activity  already 
in  progress 

detect  trend  of  multipie  related  events. 

e 

identify  etlmulus  (discrimination  time); 

• 

complete  message,  decision,  control 
adjustment; 

e 

reach  criterion  value. 

Overall  (duration)  time  • 

time  spent  in  activity; 

• 

percent  time  on  target. 

FREQUENCY  OF 

Number  of  responses  per  unit,  * 

control  and  manipulation  responses; 

OCCURRENCE 

activity,  or  interval 

communications; 

0 

personnel  interactions; 

• 

diagnostic  checks. 

Number  of  performance  • 

number  of  errors; 

consequences  per  activity,  • 

unit,  or  interval 

Number  of  observing  or  data 
gathering  responses 

number  of  oui  of-tolorance  conditions. 

* 

observations; 

a 

verbal  or  written  reports; 

• 

requests  for  information. 

[ 

1 
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Table  4-2.  Varieties  of  primary  task  measure  candidates  (Meister,  1985)  (Cont.). 


CATEGORIES _  SUBCATEGORIES _ DESCRIPTION _ 

A*T*  marry 

Correctness  of  observation;  i.e. 
accuracy,  in 

•  identifying  stimuli  Internal  to  system; 

•  identifying  stimuli  external  to  system; 

•  estimating  distance,  direction,  speed, 
time; 

•  detection  of  stimulus  change  over  time; 

•  detection  of  trend  based  on  multiple 
related  events; 

•  recognition:  signal  in  noise; 

•  recognition:  out -of -tolerance  condition. 

Response-output  correctness;  i.e., 
accuracy,  in 

•  control  positioning  or  tool  usage; 

>  reeding  displays; 

•  symbol  usage,  decision  making  end 
computing; 

•  response  selection  among  alternatives; 

•  serial  response; 

•  tracking; 

•  communicating. 

Error  characteristics 

•  amplitude  measures; 

•  frequency  measures; 

•  content  analysis;. 

•  change  over  time 

. - . . : . z=z=  zz: . =z= _ -izn 


Additional  guidance  for  deciding  what  to  measure  can  be  developed  through  established  task 
taxonomies  like  the  Universal  Operator  Behaviors  that  developed  by  Berliner,  Angell,  and  Shearer  (1964). 
In  this  organization,  human  activities  in  systems  are  separated  into  four  broad  categories: 

•  Perceptual  tasks  are  sensing  tasks;  for  example,  seeing  a  warning  light  on  an 
instrument  panel. 
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•  Mediatlonal  or  cognitive  tasks  are  those  that  Involve  thinking  (e.g.,  solving 
mathematical  problems). 

•  Communication  Includes  face-to-face  speaking,  radio,  and  other  communication 
tasks. 

•  Psychomotor  processes  are  manipulative  tasks;  those  which  involve  muscles  or 
movement  (e.g.,  activating  a  pushbutton). 

By  spreading  the  selection  of  tasks  across  these  categories,  one  increases  the  opportunities  for 
identifying  performance  measures  that  are  sensitive  to  workload  and  are  diagnostic  of  the  causes  of 
workload.  There  is  little  point,  for  example,  to  measure  two  tasks  that  fall  in  the  same  category.  But  it 
would  be  highly  useful  to  measure  two  tasks  in  afferent  categories. 

Implmmntetion 

Because  the  operator  ordinarily  performs  the  task  as  pari  of  his  duties,  primary  measures  have  the 
advantage  that  they  generally  need  not  be  intrusive  on  the  operator  nor  require  specialized  training 
(O'Donnell  &  Eggemeier,  1986;  WilHges  &  Wlerwille,  1979).  Alt  that  is  required  to  obtain  Type  1  and  Type 
2  measures  is  to  instrument  the  system.  There  are  many  instrumentation  methods  that  may  be  used  and 
their  use  is  dependent  on  the  application.  In  fielded  systems.  It  may  necessary  o  add  sensors  as  well  as 
transponding  or  recording  equipment.  Of  course,  there  are  some  situations  whe  a,  due  to  the  absence  of 
ample  space  for  adding  instrumentation,  there  may  be  a  physical  space  Intrusion  This  space  intrusion  is 
typically  less  severe  than  the  intrusion  required  for  implementation  for  some  other  methods  for 
assessment  of  OWL  (e.g.,  physiological).  Moreover,  such  limitations  can  often  be  overcome  with  a  bit  of 
ingenuity.  In  simulators,  space  is  usually  less  of  a  problem;  sensors  are  often  already  in  place  and  may  be 
used  for  the  purpose  of  gathering  data  on  the  primary  measures.  The  proliferation  and  use  of 
microcomputers  and  interface  cards  has  simplified  implementation  and  reduced  space  requirements.  In 
general,  primary  task  measures  typically  take  up  less  space  and  have  fewer  implementation  difficulties  than 
other  methods  for  assessment  of  OWL.  More  importantly,  implementation  of  primary  task  measures  will 
ordinarily  be  required  for  combat  system  evaluations  in  the  context  of  MANPRINT  considerations. 

FMbbWty  of  Primary  Task  lAtaauraa 

Primary  task  measures  have  the  potential  to  provide  Important  information  about  OWL.  This  potential 
wilt  not  be  realized,  however,  if  the  reliabilities  of  measures  across  sessions  are  inadequate.  Frequently 
assessed  by  correlation  coefficients,  reliability  is  the  consistency  of  measurement  and  involves  the 
accuracy  and  stability  of  measures  and  tLa  observational  condition  under  which  the  measures  are  made 
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(Melster,  1085).  Addressing  the  growing  concern  with  opeiattonal  performance  assessment  (General 
Accounting  Office,  1982),  Lane  and  colleagues  have  recently  Indicted  low  reliabilities  (and  resulting 
inadequate  sensitivities)  as  a  major,  chronic  problem  of  such  Investigations  (e.g,  Lane,  1986;  Lane, 
Kennedy,  &  Jones, 1986).  Their  Indictment  of  the  operational  Iterators  parallels  that  directed  at  human 
performance  evaluations  as  pan  of  environmental  Investigations  (Bittner,  Carter,  Kennedy,  Harbeson,  & 
Krause,  1086).  These  parallel  Indictments  are  based  on  mathematical  arguments  as  to  the  limitations  on 
sensitivity  imposed  by  tow  relabilties. 

Much  of  the  operational  literature  addressed  by  Lane  and  colleagues  Is  concerned  with  flight 
performance  evaluation;  however,  they  also  point  to  other  examples  including  operator  performance  in 
armor  (e.g.,  Biers  &  Sauer,  1982).  The  body  of  evidence  points  out  the  need  to  evaluate  reliabilities  and 
sensitivities  of  primary  measures  before  use  in  operational  performance  Investigations.  Relabilties  and 
sensitivities  may  be  evaluated  using  a  diverse  number  of  direct  and  indirect  approaches.  Three  of  these 
approaches  are  delineated  below  because  of  their  particular  utilities  in  the  context  of  combat  system 
evaluations. 

Operational  Tost  Experience.  Measures  may  be  selected  based  upon  sensitivities  and  reliabilities 
demonstrated  in  previous  operational  tests  of  similar  systems  under  test  conditions  generally  parallel  to 
those  planned.  Since  many  systems  are  derivatives  of  previous  systems,  this  approach  has  the 
advantage  of  building  upon  experience.  A  possible  disadvantage  is  that  of  discouraging  the 
development  of  potentially  superior  measures,  litis  disadvantage  can  be  overcome  by  using  good, 
demonstrated  measures  along  with  new  ones  which  are  specific  to  issues  of  interest  to  the  system.  For 
totally  new  systems,  the  practitioner  may  be  forced  to  develop  new  measures  but  can  build  on  experience 
to  the  extent  the  functions  and  missions  are  similar. 

Baseline  Assessment  or  Pilot  Teat.  Relabilties  of  measures  may  be  determined  by  administering 
parallel  operational  test  conditions  to  a  group  of  subjects  on  two  or  more  occasions  as  part  of  a  baseline 
evaluation  *fore  an  operational  test  formally  begins.  A  pilot  test  may  be  an  opportunity  to  obtain  the 
baseline  me  'surements.  Although  not  widely  appreciated,  the  results  of  such  baseline  evaluations  may 
be  used  to  ( l)  evaluate  the  readiness  and  training  of  the  operator-subjects,  and  (b)  identify  fundamental  or 
gross  evaluation  problems  before  resources  are  wasteful!/  expended.  Baseline  assessments  of 
reliabilities  i  ind  other  measurement  qualties  have  previously  been  appled  by  a  number  of  researchers 
(e.g.,  Bittnei  j\  al.,  1986;  Jobe  &  Banderet,  1986).  Averaged  correlations  across  occasions  may  be  used 
to  identify  measures  with  highest  reliabilities  and  initial  potential  for  sensitivity  when  the  numbers  of 
operator-subjects  are  inodest  (Dunlap,  Silver  &  Bittner,  1986). 

Theoretical  Conelderatlona.  Reliabilities  and  sensitivities  of  measures  may  occasionally  be 
evaluated  based  upon  theoretical  considerations.  Where  aspects  of  a  single  performance  may  involvo  a 
trade-off  by  an  oper<ttc»r  (e.g.,  accuracy  and  speed  in  a  data  input  task),  a  theory-based  measure  (e.g., 
transmitted-bits/seconcl)  integrating  these  trade-off  aspects  may  be  simpler  to  consider,  more  reliable,  and 


sensitive.  Care  it  required  before  use  of  such  theoretical  integrations,  however.  For  example,  the  signal 
detection  theory  sensitivity  metric  ((f)  may  not  be  applicable  because  low  frequency  of  errors 
(Parasuraman,  1986).  In  addition  to  this  caveat,  there  are  several  related  scoring  procedures  (occasionally 
advocated  to  control  for  individual  differences)  whose  use  should  be  questioned,  If  not  avoided.  These 
Include  slope,  difference,  and  proportion  of  baseline  procedures  which  have  been  attacked  for  low 
reliabilities  and  on  other  grounds  based  upon  both  analytical  and  empirical  results  (cf.,  Bittner  et  at.,  1986, 
pp.700-701). 

Summary.  These  three  reliability  considerations  are  directed  at  a  range  of  combat  system 
evaluation  contexts.  The  operational  test  experience  approach  is  appl cable  where  an  evaluation  history 
exists  and  the  baseline  assessment  approach  may  be  applied  where  some  preliminary  data  can  be 
collected.  The  scoring  consideration  approach  is  appBcable  when  there  is  neither  an  appropriate 
evaluation  history  nor  an  opportunity  to  gather  preliminary  data.  These  three  reliability  approaches  appear 
to  span  most  evaluation  contexts. 

OvanUI  Summary 

Primary  task  measurements  are  tfvided  into  two  categories.  Type  1  measures  are  of  the  system 
(including  the  operator)  and  are  used  to  establish  and  verify  the  meeting  of  mission  goals.  Type  2 
measures  are  of  the  operator  directly  and  are  used  in  the  evaluation  of  OWL.  Additionally,  when  several 
Type  2  measures  are  used  in  combination,  It  is  sometimes  possible  to  assess  the  fine  structure  of 
behavior  and  determine  performance  rules;  we  advocate  taking  several  Type  2  measures  in  every 
evaluation.  In  general,  Type  2  primary  measures  are  shown  to  be  sensitive  to  workload  variations  while 
Type  1  measures  are  not  typically  sensitive. 

Systems  differ  in  their  primary  task(s)  and  measures  used  in  one  situation  may  not  always  be  applcable 
in  another  situation.  Accordingly,  general  guidelines  are  laid  out  tor  selection  of  measures  and  their 
implementation.  A  1st  of  possible  measures,  based  on  time,  accuracy,  and  frequency  is  presented. 
Special  consideration  is  given  to  reliability  of  measures  and  the  means  to  assess  the  relability. 


CHAPTER  5.  SUBJECTIVE  METHODS 


"If  the  person  feels  loaded  and  effortful,  he  is  loaded  and  effortful  whatever  the 
behavioral  and  performance  measures  show”  (Johanssen,  Moray,  Pew,  Rasmussen, 
Sanders,  &  Wickens,  1979,  p.  105). 

"...mental  workload  should  be  defined  as  a  person's  private  subjective  experience  of 
his  or  her  own  cognitive  effort’  (Sheridan,  1980,  p.  1). 

The  primary  purpose  for  the  use  of  subjective  methods  is  to  gain  access  to  the  experiences  of  the 
operator.  Physical  workload  can  be  observed,  but  mental  workload  occurs  internally  and  can  only  be 
inferred  by  observers.  Subjective  methods  seek  to  obtain  and  quantify  the  opinions,  judgments,  and 
estimations  of  the  operators.  Indeed,  some  investigators  suggest  that  subjective  methods  are  the  most 
appropriate  methods  by  which  to  measure  workload.  For  example,  when  mental  workload  is  defined  as  "a 
person's  private  subjective  experience  of  his  or  her  own  cognitive  effort”  then  workload  measurement  is 
"best  and  most  directly  done  by  a  subjective  scale”  (Sheridan,  1980,  p.  1). 

Investigators  interested  in  mental  activities  have  worked  on  measurement  and  scaling  of  judgments. 
Many  mathematical  techniques  are  available  to  handle  subjective  opinion;  in  recent  yearn  some  of  these 
have  been  applied  to  workload.  There  has  been  much  written  on  the  use  of  subjective  methods  for 
measuring  workload.  The  multitude  of  reviews  indicates  quite  clearly  the  attitude  of  the  research 
community  for  the  extensive  use  and  the  value  of  subjective  methods  (Gartner  &  Murphy,  1976;  Moray, 
1979b.  1982;  O'Donnell  &  Eggemeler,  1986;  Wlerwille  &  Williges,  1978,  1980;  Williges  &  Wierwille, 
1979).  Although  some  researchers  think  that  subjective  reports  are  of  low  value,  most  think  these 
methods  can  provide  significant  information  about  operator  workload  (Hart,  1986a). 

There  are  many  reasons  for  the  widespread  use  of  subjective  measures.  As  outlined  by  O'Donnell  and 
Eggemeier  (1986),  these  include; 

•  easy  to  implement;  little  (if  any)  equipment  is  needed; 

•  relatively  non-intrusive; 

•  inexpensive  (i.e.,  cost  of  the  measure  is  low); 

•  face  validity  (at  the  least); 

•  many  good  techniques  exist;  and 

•  current  data  suggest  they  are  sensitive  to  workload  variations. 
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it  is  important  to  be  familiar  with  the  various  subjective  techniques  current!’;  available  and  the  research  that 
has  been  performed  In  their  development  and  refinement  Literature  describing  applications  of  subjective 
workload  measures  give  examples  of  how  and  for  what  types  of  systems  subjective  workload  measures 
have  been  umk*. 

The  subjective  methods  can  be  broker  into  two  broad  dosses:  (a)  rating  scales  and  (b)  questionnaires 
and  Interviews.  Expert  opinion  might  os  considered  an  associated  type  of  subjective  measure,  but  that 
method  of  wo  ft  Jo  ad  assessment  was  dbtcussed  earlier  in  Chapter  3.  Rating  scales  employ  psychometric 
scaling  methods  to  derive  scales  with  which  qualitative  estimates  of  some  behavior  or  characteristic  can 
be  made.  Questionnaires  and  interviews  rely  on  written  or  oral  reports  and  while  there  may  be  quantitative 
aspects  to  these  data,  for  the  most  part,  the  data  obtained  are  qualitative.  Rating  scales,  questionnaires, 
and  interviews  have  been  used  extensively  in  worklcac*  assessment.  These  methods  are  reviewed  and 
specific  subjective  techniques  for  workload  assessment  are  presented,  analyzed,  and  compared  in  the 
following  sections.  At  the  end  of  each  discussion  the  technique  or  method  is  summarized.  Before 
discussing  each  technique,  an  overview  of  the  nature  and  properties  ot  measurement  scales  is  provided 
to  set  the  stage  for  later  discussion. 

Levels  of  Measurement  and  Scales 

There  is  a  wide  body  of  information  on  the  use  ot  scales  to  measure  psychological  variables.  Many 
ways  to  create  scales  have  been  developed  and  the  resulting  scales  may  have  different  properties,  each 
may  be  appropriate  tor  different  circumstances.  Which  method  to  use  depends  on  the  questions  that 
need  answering  as  well  as  practical  considerations.  As  background  to  the  description  and  discussion  of 
specific  techniques  that  have  been  developed  and  used  for  operator  workload  assessment,  a  brief 
discussion  of  various  scale  characteristics  wi”  be  presented. 

There  are  four  widely-used  levels  of  measurement:  nominal,  ordinal,  interval,  and  ratio.  Nominal 
measures  on!y  classify  objects  and  distinguish  P  asses  ot  items.  Ordinal  measures  place  objects  in  order 
ot  magnitude,  although  distance  between  the  objects  is  not  defined.  For  example,  on  an  ordinal 
measurement  scale,  a  stick  with  a  rank  of  4  would  not  necessarily  be  twice  as  long  as  a  stick  with  a  rank  ot  2. 
Interval  levels  of  measurement  possess  equal  Intervals  between  objects;  there  is  a  standard  unit  of 
measure  but  without  a  fixed  zero  point,  e.g.,  a  thermometer.  Ratio  measures  have  equal  intervals  and  a 
known  zero  point.  A  ruler  is  an  example  of  a  ratio  measurement;  a  6-inch  sti;';  is  twice  as  long  as  a  3- inch 
stick.  Ratios  can  then  be  formed  and  statemerts  about  the  relative  amount  ot  a  characteristic  being 
measured  can  be  made  (Allen  &  Yen,  1979). 
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A  scale  Is  an  organized  set  of  measurements  (Atten  &  Yen,  1979).  The  different  types  of  scales  can  be 
produced  by  different  methods.  Scales  that  list  values  of  a  property  along  a  line,  even  if  the  properties  are 
placed  an  equal  distance  apart  on  the  line,  are  at  toast  ordinal  and  may  be  interval.  Just  because  lines  are 
equidistant  on  a  piece  of  paper  does  not  mean  the  scalar  is  interval.  The  method  of  paired  comparisons 
also  produces  scales  with  ordinal  levels  of  measurement.  Interval  scales  can  be  produced  through 
Thurstone's  method  of  comparative  judgments  or  conjoint  measurement.  Ratio  scales  can  be  obtained 
using  methods  of  estimation  where  observers  ehectiveiy  make  judgments  of  the  ratio  between  the 
magnitudes  of  two  perceptions.  These  methods  are  described  In  detail  in  books  on  psychological  or 
psychophysical  measurement  (e.g.,  Edwards,  1957;  Geschekter,  1985).  However,  it  is  important  to 
realize  that  the  way  in  which  scales  are  developed  will  determine  whether  a  scaie  has  nominal,  ordinal, 
interval  or  ratio  properties.  This  level  of  measurement  in  turn  will  be  one  of  the  major  factors  in  determining 
how  the  data  can  be  interpreted. 

Another  characteristic  of  a  measurement  scale  is  its  dimensionality.  In  essence  iis  is  an  indication  of 
what  the  scale  is  intended  to  measure  There  can  be  unidimensional  scales  that  arc  intended  to  measure 
only  one  aspect  or  attribute.  Multidime  ns  lore!  scales,  on  the  other  hand,  are  inter  ded  to  measure  more 
than  one  dimension  concurrently.  Specific  statistical  methods  are  available  to  ornate  multidimensional 
scales  and  these  have  been  used  to  create  scales  that  specifically  address  OWL.  Whether  the  scale  is 
uni-  or  multi  dimensional  has  implications  as  to  what  is  to  be  measured  (i.e.,  what  is  to  be  rated)  and  how 
workload  is  conceived.  For  example,  a  global,  unidimensional  rating  of  workload  implies  that  there  is  a 
single  attribute  ol  workload  that  can  be  identified  and  rated.  For  such  a  rating,  operators  have  to  combine 
internally  all  aspects  of  workload  into  a  single  metric  and  the  degree  to  which  various  aspects  contributed 
to  the  overall  rating  are  not  ascertainable.  Tsang  and  colleagues  have  employed  such  a  unidimensbnal 
overall  workload  scale  using  a  line  divided  into  20  intervals  with  the  end  points  anchored  at  low  and  tigh 
work  toad  (Tsang  &  Johnson,  1387;  Viduiich  &  Tsang,  1987).  With  multidimensional  scales  it  is  espe<  ially 
important  that  the  relative  importance  of  the  various  measured  components  of  workload  be  fclen  ified 
explicitly.  For  example,  the  NASA  Task  Load  Index  (TLX)  uses  six  dimensions  while  the  Subje  ctive 
Workload  Technique  (SWAT)  uses  three. 

Operators  and  observers  can  both  bo  asked  to  make  ratings.  Operators  can  make  judgments  about 
their  subjective  experiences.  At  the  same  time,  observers  could  monitor  ttie  behavior  of  tho  operators 
and  make  judgments  about  the  level  of  workload  the  operator  Is  experiencing.  This  Is  essentially  a 
subjective  opinion  about  someone  else's  subjective  experience.  However,  since  observers  cannot 
Deserve  internalized  operator  activities  such  as  information  processing  and  monitoring,  their  judgments 
may  be  less  useful  than  those  of  the  operator.  On  the  other  hand,  observers  may  be  able  to  s  re  more 
than  a  busy  operator  (Hart,  1986a).  For  example,  operator*  may  experience  tunnel  vision, « rhil?  the 


83 


observer  maintains  a  larger  field  of  view.  The  use  of  ratings  of  the  same  tasks  or  mission  segments  by  both 
operators  and  observers  may  provide  more  reliable  information  regarding  OWL. 

Workload  Rating  Scales 

Subjective  scaling  techniques  have  been  used  to  develop  rating  scales  for  workload  measurement.  In 
general,  these  rating  scale.,  have  been  developed  In  aviation  communities  for  measurement  of  pilot 
workload,  with  the  exception  of  the  Modified  Cooper-Harper  scale.  In  some  instances,  the  rating  scale  is 
specific  to  pilot  activities  and  would  need  modification  to  extend  its  applicability  to  non-piloting  activities.  In 
othor  instances,  the  scale  would  be  applicable  as  it  exists  to  a  wide  range  of  tasks  and  environments.  The 
degree  of  applicability  has  been  noted  where  appropriate. 

The  specific  subjective  scale  techniques  described  in  this  report  indude: 

•  the  Cooper-Harper  scale  and  modified  versions  of  the  scale  which  use  a  decision  tree 
structure. 

•  the  NASA-Task  Load  Index  and  Bipolar  scales  that  obtain  individual  weighted  scores 
of  several  dimensions  of  workload, 

•  psychometric  techniques,  such  as  magnitude  estimation  and  equal-interval  scales, 
and 

•  the  Subjective  Workload  Assessment  Technique  (SWAT)  which  uses  conjoint 
analysis. 

Other  rating  scales  which  have  been  developed  and  used  for  specific  purposes  are  discussed  as 
examples  of  applications.  Comparisons  among  the  techniques  as  well  as  other  key  issues  are  discussed 
following  presentation  of  the  techniques 

Cooper-Http*  Sea*  and  Variations 

Perhaps  the  most  widely  used  workload-related  decision  tree  rating  scale  is  the  Cooper-Harper  (CH) 
scale  (Cooper  &  Harper,  1969).  H  is  a  10-point  unidimensiona!  rating  scale,  resulting  in  a  global  rating  on 
an  ordinal  scale  of  the  experience  ot  piloting  an  aircraft.  It  was  primarily  intended  for  use  by  pilots  to  rare 
aircraft  handling  and  control  qualities,  but  pilot  workload  and  compensation  are  mentioned  in  the  scale 
shown  in  Figure  5-1 .  O'Donnell  and  Eggemoier  (1986;  the  supporting  evidence  which  shows  a 

relation  between  CH  ratings  and  workload  (e  g.,  Hess,  1977).  Wierwille  and  Connor  (1983)  also  found  the 
CH  scale  to  be  sensitive  to  handling  properties.  They  concluded  ‘.hat  the  CH  scale  can  be  confidently 
useo  for  tasks  that  are  primarily  motor  or  psychomotor.  These  findings  are  generally  supported  by 
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previous  workload  literature  (Wierwlite  &  WllUges,  1980).  Haworth,  Bhrons,  and  Shively  (1980)  have 
recently  found  a  correlation  of  .75  between  CM  ratings  and  NASA  Bipolar  ratings  and  a  r  >rreiatk>n  of  .79 
between  CH  and  SWAT  ratings,  indicating  considerable  agreement  among  the  scales  and  overlap  of  the 
underlying  psychological  dimension. 


Workload  and  effort  in  the  Honeywell  version  inay  be  considered  task-related,  rather  than  the  terms 
compensation  and  deficiencies  In  the  CM,  which  are  more  hardware,  especially  aircraft,  oriented.  This 
scale  was  used  in  a  study  of  vertical  fake  off  and)  landing  ( VTOL)  aircraft  displays  (North,  Stackhouse,  & 
Graff  under,  1978).  In  general,  the  ratings  were  In  agreentenf  with  the  performance  data;  however,  seen  as 
were  obtained  for  onJy  a  subset  o*  all  condBtorw.  North  et  al.  did  not  draw  strong  conclusions  concerning 
the  use  of  this  scale  because  not  aC  factors  inttuei  rdng  workload  were  rated. 


Figure  5  2.  The  Honeywell  version  of  the  Cooper-Harper  scale. 
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Modified  CoopeMisrper.  Wierwille  and  Casall  (1963)  developed  the  Modified  Cooper-:- <arper 
scale  for  the  purpose  ot  workload  assessment  in  systems  where  perce,  rtucl.  mediations!  and 
communications  activity  is  present  (rigure  5-3).  The  modification  was  developed  for  use  in  those 
situations  where  the  task  was  not  primarily  motor  or  psycho  motor  and  the  CH  might  not  be  appropriate 
Wierwiite  and  his  colleagues  have  performed  a  series  of  laboratory  experiments  to  validate  the  Modified 
CH  as  a  wo*kk>ad  assessment  technique.  Three  experiments  using  this  scale  are  described  in  Wierwille 
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Figure  5-3.  The  Modified  Cooper-Harper  scale  (Wierwille  &  Oasa!!,1983). 

arvd  Casaii  (1933).  These  experiments  were  performed  in  a  simulated  aircraft  environment  and  all  were  pari 
o!  larger  studies.  Six  licensed  pitots  participated  as  subjects  in  each  experiment.  Perceptual  tasks 
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involved  the  udentttication  of  danger  Indicators  and  required  a  pushbonon  response.  One  ot  three  load 
levels  (low,  medium,  or  high)  was  used  tor  each  flight.  After  each  flight,  subjects  gave  a  Modified  OH 
rating.  The  results  Indicate  scores  were  significantly  different  for  each  level  of  load  with  the  srxure 
increasing  monotonically  with  load.  The  experiment  that  locked  at  mediations!  (cognitive)  load  used 
navigation  tasks  involving  various  number  and  complexity  of  arithmetic  and  geometric  operations  for  each 
load  level.  The  navigation  solutions  were  only  calculated,  not  implemented,  so  the  psychomctor 
«rtmn>ni$  did  not  differ.  Results  showed  significant  differences  between  tow  vs.  high  and  medium  vs. 
high  with  the  score  means  increasing  monotonically  with  toad.  The  communications  experiment  involved 
the  use  of  radio  aircraft  control  and  communications  tasks  Including  commands  for  changes  In  altitude  and 
neading  and  communications  such  as  reporting  call  signs  and  heading,  altitude,  and  airspeed  Information. 
Significant  differences  were  found  between  tow  vs.  medium  toad  and  low  vs.  high  load.  Score  means 
increased  monotonically  with  toad  level.  The  authors  conclude  that  the  Modified  CH  scale  ratings  are  valid 
and  statistically  reliable  measures  of  overall  workload  and  that  the  Modified  CH  shows  a  consistent,  good 
level  ot  sensitivity  across  the  three  types  of  tasks  (Wierwilto,  CasaU,  Connor,  &  Rahimi,  1985).  Modified  CH 
ratings  were  found  to  be  equally  sensitive  to  task  difficulty  as  SWAT  (Warr.  Cole,  &  Reid,  1986). 

Wierwille,  Skipper  and  Rieger  (1984)  conducted  two  studies  to  test  whether  the  sensitivity  of  the 
Modified  CH  could  be  increased  by  changing  from  a  10-point  to  a  15-point  scale  or  by  changing  the  format 
to  computer-based  or  tabular  form,  in  general,  they  concluded  that  the  original  Modified  CH  was  the  most 
consistently  sensitive  measure  of  the  five  alternatives  tested. 

The  Bedford  Scale.  The  Pilot  Workload  Rating  Scale,  also  called  the  Bedford  scale,  is  a  decision 
tree  scale  derived  from  the  CH.  It  is  shown  in  Figure  5-4.  It  was  developed  by  Roscoo  and  Ellis  (Rosccs, 
1987a)  at  the  Royal  Aircraft  Establishment,  Bedford,  England  for  workload  assessment  in  the  military 
aviation  environment.  The  technique  obtains  subjective  judgments  about  workload  based  on  ability  to 
con>plete  tasks  and  the  amount  of  spare  capacity  available,  it  was  found  that  aircrew  were  able  to 
understand  the  scale  and  that  It  was  easy  to  remember  and  small  enough  to  be  carried  on  a  flight  suit  knee 
pad  (Udderdale,  1987). 

The  Bedford  scale  has  been  applied  in  several  workload  evaluation?  of  aircrews.  Wainwright  (1987) 
reports  Its  application  to  assess  workload  for  a  minimum  crew  of  two  pilots  oi  •  civilian  (BAE  146)  aircraft, 
i  hree  teams  of  two  pilots  each  participated  in  the  certification  program.  The  evaluation  was  based  on 
subjective  opinion  and  heart  rate  monitoring  for  high  workload  segments  with  crews  that  were  asked  to  fly 
tong  duty  days  with  minimum  rest.  Pitots  gave  ratings  and  an  observer  of  the  pilot’s  performance  gave  a 
rating  as  we!)  as  the  signal  to  the  pilot  to  rats  the  previous  task.  The  overall  analysis  of  workload,  including 
subjective  measures,  hear?  rate  and  performance  errors,  suggested  that  the  two-pJ^t  crews  were  not 
u.arioaded,  i  d.,  crew  members  reported  they  had  spare  capacity. 
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A  similar  study  measured  workload  for  a  two  person  crew  in  an  advanced  combat  aircraft  during  tow  level 
maneuvers.  Although  there  was  concern  that  real-time  ratings  would  not  be  possible,  the  aircrew  were 
able  to  give  In-tiight  ratings  even  In  demanding  circumstances  (Lkiderdale,  1S87).  However,  the  Bedford 
scale  was  found  to  be  inappropriate  for  obtaining  workload  assessments  during  post-flight  debriefings. 


Workload  Description  Rating 


Figure  5-4.  The  Bedford  scaie  (described  in  Rosooe,  1987a). 

The  aircrew  found  it  difficult  to  reconstruct  the  complex  experiences  of  the  flight  a^d  thus  they  could 
not  be  confident  in  the  accuracy  of  their  responses.  No  other  discussion  was  made  of  this  point,  so  it  is 
not  clear  whether  the  post-flight  rating  difficulty  was  due  to  workload  descriptions  in  the  Bedford  scale 
itself  or  a  more  general  problem  that  would  occur  in  all  post-flight  ratings  of  workload. 
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The  use  of  the  Bedford  scale  was  well  accepted  by  aircrews  (Udderdale,  1987),  particularly  when  tasks 
are  short  and  well  defined  (Roscoe,  1987a) .  Rating  pads  with  10  push  buttons  were  used  as  the  means  to 
obtain  the  ratings.  Roscoe  has  identified  some  limitations  in  the  scale's  use:  the  ratings  given  are  not 
absolute  values  and  are  dependent  on  the  operator's  personal  experience,  therefore  comparisons 
between  operators  are  not  valid.  Also,  real-time  ratings  may  not  be  be  possible  if  a  second  person  is  not 
available.  Like  other  versions  of  the  CH  scale,  the  Bedford  scale  produces  ordinal  data  and  therefore 
statistical  analysis  is  limited  to  rank  order  tests. 

Bedford  ratings  were  found  to  correlate  well  with  heart  rate,  although  not  always  consistently  (e.g., 
Wainwrtght,  1907).  The  use  of  the  Bedford  scale  has  been  primarily  in  applied  settings  -  only  one  study 
was  found  where  it  was  used  in  a  controlled  setting  with  defined  levels  of  task  difficulty.  Tsang  and 
Johnson  (1987)  used  the  Bedford  scale,  the  NASA-TLX,  and  an  overall  workload  scale  to  measure 
subjective  workload  In  several  manual  and  semi -automated  tasks,  fhe  Bedford  scale  ratings  were  slightly 
different  from  those  obtained  with  the  other  two  measures  although  the  authors  suggest  these  findings 
support  fhe  ability  of  the  Bedford  to  measure  spare  capacity.  However,  these  findings  are  based  on  a 
small  amount  of  data  and  should  be  used  cautiously. 

Summary.  Workload  rating  scales  based  on  decision  tree  structures  have  been  found  to  be 
sensitive  to  different  levels  of  workload  in  various  task  types  (e.g.,  Wierwille  et  a!.,  1985).  The  scales  have 
been  found  to  be  easy  to  administer  and  well  accepted  by  operators.  These  rating  scales  have  been  used 
almost  exclusively  in  aviation  research  to  assess  pibt  workload;  however,  the  Modified  CH  and  the 
Bedford  scales  would  be  applicable  to  other  operational  environments  (with  minor  modifications  such  as 
changing  the  word  pilot  to  oporatoi).  Finally,  interpretations  other  than  as  ordinal  scales  should  be 
approached  with  great  caution  because  of  the  nature  of  the  scales. 

NASA-Ames  Workl  oad  Rating  Scabs 

The  Human  Performance  Group  at  the  NASA-Ames  research  facility  has  been  extensively  involved  in 
workload  assessment  research.  As  part  of  the  overall  effort,  much  work  has  gone  into  the  development  of 
workload  rating  scales  as  subjective  measurement  techniques.  Two  major  theoretical  considerations 
influenced  the  scale  development.  The  first  consideration  was  the  multidimensional  nature  of  workload, 
resulting  in  multiple  worf  doaJ  dimensions.  The  second  consideration  was  the  individual  nature  of  which 
dimensions  of  workload  are  more  Important  for  individual  operators  rating  specific  tasks.  This 
consideration  led  to  development  of  individual  weighting  procedures. 

The  NASA-Bipolar  scales  are  a  group  of  nine  scales  that  reflect  nine  dimensions  of  workload  plus  an 
overall  workload  scale.  The  descriptions  of  the  ten  scale  dimensions  are  presented  in  Table  5-1.  Each 
scale  is  presented  as  a  single  line  broken  into  20  spaces  as  shown  In  Figure  5-5.  The  operator  marks  the 
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location  on  the  scale  that  corresponds  to  his  or  her  subjective  experience  related  to  a  specific  task.  A 
score  from  0  to  100  is  obtained  for  each  scale  (assigned  to  the  nearest  5).  The  ratings  are  assumed  to 
have  interval  properties.  The  weighting  procedure  used  to  combine  individual  scale  ratings  involves  a 
paired  comparison  task  using  all  pairs  of  individual  dimensions.  Paired  comparisons  require  the  operator 


Table  5-1 .  NASA  Bipolar  rating  scale  descriptions  (Hart  &  Stave  Land,  1987). 


TWe 

Descriptions 

OVERALL  WORKLOAD 

Low,  High 

The  total  workload  associated  with  the  task 
considering  all  sources  and  components. 

TASK  DIFFICULTY 

Low,  High 

Whether  the  task  was  easy  demanding, 
simple  or  complex,  exacting  or  forgiving. 

TIME  PRESSURE 

None,  Rushed 

The  amount  of  pressure  you  felt  due  to  the 
rate  at  which  the  task  elements  occurred. 
Was  the  task  slow  and  leisurely  or  rapid  and 
frantic. 

PERFORMANCE 

Perfect,  Failure 

How  successful  you  think  you  were  in  doing 
what  we  asked  you  to  do  and  how  satisfied 
you  were  wit*'  what  you  accomplished. 

MENTAL/SENSORY  EFFORT 

None,  Impossible 

The  amount  of  mental  and/or  perceptual 
activity  that  was  required  (e  g.,  thinking, 
deciding,  calculating,  remembering,  looking, 
searching,  etc.). 

PHYSICAL  EFFORT 

None,  Impossible 

The  amount  of  phy.icai  activity  that  was 
required  (e.g.,  pushing,  pulling,  turning, 
controlling,  activating,  etc.). 

TRUSTRAT1CN  LEVEL 

Fulfilled,  Exasperated 

How  insecure,  discouraged,  irritated,  and 
annoyed  versus  secure,  gratified,  content, 
and  complacent  you  felt. 

STRESS  LEVEL 

Relaxed,  Tense 

How  anxious,  worried,  uptight,  and  harassed 
or  calm,  tranquil,  placid,  and  relaxed  you  felt. 

FATIGUE 

Exhausted,  Alert 

How  tired,  weary,  worn  out,  and  exhausted 
or  fresh,  vigorous,  and  energetic  you  felt. 

ACTIVITY  TYPE 

Skill  Based,  Rule  Based, 
Knowledge  Based 

The  degree  to  which  the  task  required 
mindlass  reaction  to  well-learned  routines  or 
required  the  application  of  known  rules  or 
required  problem  solving  and  decision 
making. 

IMM'M’iTMi  M  'll11  'Mill1 
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Figure  5-5.  The  NASA  Bipolar  rating  scales  (adapted  from  Boitolussi,  Karrtowitz,  &  Hart,  1966). 

to  choose  which  dimension  is  mere  relevant  to  workload  for  a  particular  task  across  all  pairs  of  the  nine 
dimensions.  The  number  of  times  a  dimension  is  chosen  as  more  relevant  is  the  weighting  of  that 
dimension  scale  for  a  given  task  for  that  operator.  The  procedure  permits  a  weighting  of  zero  for 
dimensions  that  are  judged  as  not  relevant  to  workload  for  that  task  (Hart,  Battiste  &  Lester,  1984).  A 
workload  score  from  0  to  1 00  is  obtained  by  multiplying  the  weight  by  the  dimension  scale  score,  summing 
across  scales  and  dividing  by  the  total  weights  (36  paired  comparisons).  The  weighting  procedure 
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implicitly  assume*  that  tho  Individual  dimensions  have  ratio  scale  properties.  The  weighting  procedure 
has  been  found  to  reduce  between-subject  variability  by  up  to  50%  compared  to  unidimcnsionai  overall 
workload  rating  (Hart  et  al.,  1984;  Miller  &  Had  1984). 

c  K.'*SA  Tack  Load  index  (TLX)  was  derived  from  the  NASA-Bipolar  scales  and  uses  a  similar 
weighting  procedure.  It  may  be  considered  a  shorter  and  more  refined  version.  The  NASA-TLX  uses  six 
dimensions,  thereby  considerably  redudnn  the  number  of  paired  comparisons  from  36  to  15.  Aspects  of 
task,  behavior,  and  the  operator  are  all  included  in  the  T1 X.  The  first  three  dimensions  can  be  considered 
as  characteristics  of  the  task;  the  next  two  can  be  considered  as  behavioral  characteristics;  and  the  final 
scab  is  related  to  the  operator's  individuai  characteristics.  The  six  dimensions  are; 

•  menial  demand, 

•  physical  demand, 

•  temporal  demand, 

•  performance, 

•  effort,  and 

•  frustration. 

The  descriptions  of  these  dimensions  are  shown  in  Table  5-2.  Twenty-step  bipolar  scales  are  used  as  the 
means  to  obtain  ratings  for  the**  dimensions,  as  shown  in  Figure  5-6.  Several  factors  were  considered  in 
choosing  which  dimensions  to  include  in  the  TLX.  Criteria  such  as  dimension  sensitivity,  independence 
from  other  dimensions,  and  subjective  importance  to  individual  concepts  of  workload  were  considered. 
For  ease  of  implementation  (both  in  the  weighting  procedure  and  the  actual  rating  of  scales),  no  more  than 
six  dimensions  were  desired.  A  thorough  discussion  of  the  development  of  the  NASA-TLX  is  presented 
in  Hart  and  Staveland  (1987). 

Both  the  NASA-TLX  and  Bipolar  scales  have  been  used  in  laboratory  and  operational  environments. 
These  applications  are  characterized  in  the  following  descriptions.  TLX  was  used  in  studies  of  pilot 
workload  in  helicopters  (Shively,  Battiste,  Matsu  mote,  Pepiton,  Bortotussi,  &  Hart,  1987).  Four  NASA  test 
pilots  (lew  an  SH-3G  helicopter  on  two  different  mission  scenarios  for  a  total  of  eight  flights.  Subjective 
and  physiological  data  were  collected  during  the  flight.  The  TLX  rating  scales  were  administered  at  the 
end  of  each  flight  segment.  During  the  r;«,  *  the  pilot  transferred  co  rtrol  of  the  helicopter  to  the  safety 
pilot.  After  rating  completion,  control  was  returned  to  the  pilot.  If  control  transfer  to  the  safety  pilot  could 
not  be  done  without  excessive  disruption,  the  pilot  rating  for  that  segment  would  be  flayed  until  after  the 
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next  flight  segment.  Pilots  were  never  required  to  rate  more  than  two  consecutive  segments  at  one  time 
and  each  flight  segment  contained  a  major  flight  task  such  as  hover,  terrain  following,  or  landing. 


Table  5-2.  NASA  TLX  rating  scale  description  (NASA-Ames  Research  Center,  1986). 


|  Title  Endpoints 

Description 

Mental  Demand 

Very  Low/ 

Vary  High 

How  mentally  demanding  was  the  task 

Physical  Demand 

Very  Low/ 

Very  High 

How  physically  demanding  was  the  task. 

Temporal  Demand 

Very  Low/ 

Very  High 

How  hurried  or  rushed  was  the  pace  of  the 
task. 

Performance 

Perf  oct/Failure 

How  successful  were  you  in  accomplishing 
what  you  were  asked  to  do. 

Effort 

Very  Low/ 

Very  High 

How  hard  did  you  have  to  work  to  accom¬ 
plish  your  level  of  performance. 

Frusu  alien 

Very  Low/ 

Very  High 

How  insecure,  discouraged,  irritated,  and 
annoyed  were  you. 

Results  indicate  that  TLX  significantly  discriminated  between  flight  segments  In  both  scenarios  - 
subjective  ratings  and  available  performance  measures  were  compared  and  appeared  to  have  a 
relationship  where  a  lower  workload  rating  corresponded  to  better  performance.  Statistical  analyses  were 
not  performed  due  to  the  limited  amount  of  performance  data  available.  However,  the  TLX  measures 
appeared  to  be  sensitive  to  both  fliglit  segment  differences  as  well  as  performance  measures. 

Other  applications  include  the  use  of  the  Bipolar  scales  in  a  laboratory  study  where  short-term  memory 
load,  tracking  task  difficulty,  and  time-on-task  were  the  manipulated  variables  (Bifemo,  1985).  Subjective 
ratings  were  found  to  correlate  positively  with  certain  physiological  measures  of  workload.  Ratings  of 
fatigue  and  workload  were  significantly  correlated  for  89%  of  the  subjects. 

Bortolussi,  Hart  and  Shively  (1S87)  found  that  the  Bipolar  scales  differentiated  significantly  between 
iow  and  high  levels  of  scenario  difficulty  in  a  mottosvbased  simulator  when  21  flight-related  activities  were 
added  in  the  high  difficulty  scenario.  These  results  replicate  results  obtained  in  a  similar  study  (Bortolussi. 
Kantowitz,  &  Hart,  1986),  supporting  the  reliability  of  the  subjective  ratings  in  different  experiments  using 
the  same  tasks  but  diherent  subjects. 


Figure  5-6.  The  NASA  Task  Loading  Index  (TLX)  rating  scales  (NASA-Ames  Research  Center,  1986). 


Vidulich  and  Dandit  (1986)  found  the  Bipolar  scales  to  be  sensitive  to  the  effects  of  training  on 
subjective  workload  ratings  when  the  training  produced  lower  cognitive  load  through  development  of 
automaticty  in  a  category  search  task. 

Several  comparative  studies  have  used  one  of  the  NASA  scales  as  well  as  other  OWL  subjective 
techniques.  The  NASA  scales  have  had  high  correlations  with  other  subjective  measures.  Hawortn, 
Bivens  and  Shively  (1986)  used  the  NASA-Bipolar  scales  in  assessment  of  single  pilot  workload  for 
helicopter  nap-of-the-earth  (NOE)  missions.  The  correlation  of  NASA-Bipolar  with  Cooper-Harper  was 
0.79  and  0.67  with  SWA  i .  In  a  study  by  Tsang  and  Johnson  (19C7),  TLX  and  a  unidimensional  overall 
workload  scale  followed  very  similar  trends.  Vidulich  and  Tsang  (1987)  found  similar  correlations  among 
TLX,  an  overall  workload  scale,  and  the  Analytic  Hierarchy  Process. 
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Vidulich  and  Tsang  (1985a,  1985b,  1986)  compared  the  subjective  measures  obtained  from  NASA- 
Bipolar  and  SWAT  tor  both  tracking  and  spatial  transformation  tasks.  Both  techniques  v.erj  found  to  be 
sensitive  to  various  leveis  of  task  demands  and  generally  provided  similar  results  A  comparison  of  the  two 
techniques  shows  that  the  NASA-Bipciar  scales  result  in  less  between-subject  variability  but  use  more 
dimensions  of  workload  (although  at  the  time,  it  was  unclear  whether  ail  nine  dimensions  added 
formation).  NASA-Bipolar  required  less  time  in  the  weighting  procedure  compared  to  SWAT’s  scale 
development  procedure,  but  more  in  actual  ratings  because  ot  the  n«n«  scales  as  compared  to  three 
dimensions  ot  SWAT  A  similar  comparison  between  NASA-TLX  and  SWAT  has  not  yet  been  reported  in 
thu  literature. 

Of  the  two  NASA  scales,  the  TLX  scale  Is  the  varston  that  is  recommended  by  NASA.  Information  for 
administration  of  TLX  is  contained  In  Collecting  NASA  Workload  Ratings:  A  Paperand  Pencil  Package 
(NASA- Arnes  Rossarch  Center,  1986).  Contained  in  this  package  are  copies  ot  the  six  rating  scales,  the 
fifteen  psired^ompartscns,  sources  of  workload  tally  sheets  and  instructions  on  the  prccodures  to  follow 
to  obtain  individually  weighted  workload  scores.  A  computerized  version  is  also  available  which  provides 
software  that  will  display  tie  rating  scales,  tally  the  sources  of  workload,  and  provide  the  weighted  scores. 
Being  newer,  not  as  much  research  has  been  reported  for  the  TLX  version  at  for  the  full  Bipolar  version. 
Further  research  and  application  examples  win  provide  additional  information  with  which  to  characterize  the 
TLX  scale  fuly. 

As  with  other  multidimensional  scales,  not  only  can  an  overall  workload  score  be  obtained,  but  the 
individual  scales  could  be  used  to  diagnose  what  aspects  ot  workload  were  particularly  relevant  for  a 
specific  task.  The  ability  to  identify  what  task,  behavior,  or  operator  characteristic  was  judged  to  have  the 
greatest  impact  on  the  perception  of  workload  would  provide  an  additional  diagnostic  tool  to  assess 
system  design  alternatives 

Summary.  Both  the  NASA-Bipolar  and  TLX  scales  have  been  proven  to  be  valid,  reliable  and 
sensitive  techniques  tor  OWL  assessment.  The  scales  heve  been  used  in  laboratory  and  applied 
settings.  The  multidimensional  nature  of  workload  and  the  relevance  of  various  workload  dimensions  to 
Individual  assessment  of  workload  are  both  accounted  for  in  the  individual  weighting  procedure  and  six 
dimensions  used  in  TLX.  TLX  was  derived  from  the  Bipolar  scales  and  is  the  technique  currently 
recommended  by  NASA.  Certainly,  the  approach  used  by  both  scales  Is  useful.  The  TLX  Is  more  practical 
for  operational  applications  because  it  is  shorter  and  takes  less  time  to  complete.  TLX  is  only  beginning  to 
be  characterized  although  the  validity  of  its  underlying  approach  is  supported  by  its  predecessor's  (Bipolar 
scales)  research  base 
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Psychomstils  Techniques 


Among  the  rating  scale  techniques  available  are  those  that  are  baaed  on  classic  psychometric  scaling 
methodologies.  Psychologists  have  used  these  methods  as  a  means  of  quantitatively  measuring 
psychological  attributes.  Workload  might  be  considered  to  be  such  an  attribute.  Among  the  best  known 
ure  magnitude  estimation,  paired  comparisons,  and  equal  appearing  intervals. 

Magnitude  estimation.  Magnitude  estimation  is  a  psychophysical  method  that  requires  a  subject 
tc  make  direct  numerical  assignments  to  the  magnitude  of  some  sensory  experience,  it  is  one  of  the  most 
frequently  used  psychophysical  scaling  methods  (Geschekfer,  1985).  There  are  two  main  procedures  for 
obtaining  magnitude  estimation  (Stevens,  1958;  197S).  In  the  first  method,  a  subject  is  presented  with  a 
standard  stimulus  and  is  told  this  experience  represents  a  certain  numerical  value  (called  a  modulus).  The 
subject  is  then  asked  to  make  judgments  relative  to  the  modulus.  For  example,  if  the  modulus  Is  assigned 
10  and  the  experience  is  judged  to  be  twice  as  great  as  the  one  created  by  the  standard  stimulus,  the 
subject  would  say  20  in  the  second  method,  the  modulus  Is  not  defined  by  the  experimenter  and  the 
subject  is  asked,  in  essence,  to  establish  his  own  modulus. 

Some  research  has  used  magnitude  estimation  In  workload  assessment  (e.g.,  Borg,  1978;  Helm  & 
Heimstra,  1981).  High  correlations  have  been  reported  between  subjective  estimates  of  workload  and 
task  difficulty  (e.g.,  Helm  and  Heimstra  used  Information  load  (bits/sec)  as  the  measure  ot  task  difficulty). 
Masline  (1986)  found  equal  sensitivity  among  magnitude  estimation,  equal  appearing  intervals,  end 
conjoint  scaling  as  used  in  SWAT.  Gopher  and  Braune  (1984)  describe  the  use  of  magnitude  estimation 
scaling  for  workload  assessment  in  "1  experimental  conditions.  A  single-axis  tracking  task  was  used  as 
the  modulus  and  given  a  value  of  100.  After  each  trial,  subjects  were  asked  to  estimate  the  load  or 
demand  of  otb*  tasks.  Gopher  and  Braune  found  that  subjects  did  not  have  any  difficulty  in  assigning 
numbers  despite  the  wide  variety  of  tasks.  They  also  constructed  a  power  function  and  used  k  to  predict 
the  loads  of  dual  tasks  from  single  task  scores.  They  found  a  high  correlation  between  resource 
requirements  (derived  from  subjective  scores)  and  an  index  of  task  difficulty,  but  low  correlations  with 
reaction  time  performance  measures. 

The  magnitude  estimation  method  was  aiso  used  by  Kramer,  Sirevaag  and  Braune  (1987)  to  collect 
subjective  ratings  ot  OWL  in  a  single-engine,  fixed-base  simulator.  A  five  minute  straight  and  level  flight 
path  segment  was  used  as  the  modulus  and  assigned  a  value  of  100.  These  researchers  founa  that  the 
subjective  ratings  corresponded  well  to  flight  task  performance,  as  measured  by  flight  heading  and 
altitude  deviations,  reaction  time,  and  accuracy  of  the  auditory  secondary  probe  task.  Both  subjective 
ratings  and  performance  measures  differentiated  between  easy  and  difficult  flights  and  between  flight 
segments.  However,  the  pilots'  workload  estimates  indicate  that  folding  patterns,  takeoffs,  and  landings 
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were  equally  dWflcu*.  wh Me  performance  measures  Indicated  holding  patterns  and  streSgrk-leveMligit!  ware 
done  better  than  takeoff*  and  landings. 

One  of  sne  way*  of  using  magnitude  estimation  is  to  have  e  standard  reference  task  (a  modulus)  against 
which  relative  judgments  of  workload  are  made.  Rather  than  afowtng  subjects  to  make  relative  judgments 
against  their  own  Internal  reference  developed  from  past  expetience,  Hart  and  Stave  la  rid  (1937)  suggest 
that  a  standard  reference  task  may  reduce  between- subject  variability.  They  suggest  that  reference  tasks 
may  assist  in  providing  a  stable  judgment  set  from  which  to  make  estimates  of  subjective  workload.  They 
also  suggest  that  the  reference  task  should  share  elements  with  the  experimental  tasks  to  be  performed, 
because  the  workload  of  different  tasks  may  be  caused  by  different  task  dimensions.  The  reference  task 
should  provide  the  opportunity  to  nu&e  comparable  judgments. 

O'Donnell  and  Eggemeier  (1986)  review  workload  assessment  that  has  used  magnitude  estimation 
and  they  conclude  that  the  data  support  the  estimates  obtained  from  magnitude  estimation  techniques. 
They  do  caution,  however,  that  the  use  of  magnitude  estimation  may  have  practical  limitations.  For 
example,  subjects  may  not  be  able  to  retain  and  use  the  same  modulus  over  time.  In  addition, 
counterbalanced  presentation  of  stimuli,  normally  used  in  laboratory  magnitude  estimation  experiments 
may  not  be  possible.  O'Donnell  and  Eggemeier  (1986)  suggest  that  the  impact  of  these  potential 
problems  should  be  Identified  before  magnitude  estimation  techniques  are  used  in  operational 
environments. 

Paired  compariaona.  Other  psychometric  techniques  that  might  be  used  for  workload 
assessment  include  paired  comparisons  (also  called  Thurstons  scaling  techniques).  In  the  paired 
comparison  technique,  subjects  choose  one  of  a  pair  of  stimuN  which  has  more  of  the  characteristic  being 
judged.  The  number  of  comparisons  made  is  n(rvl)/2,  where  n  la  the  number  of  stimuli.  Therefore,  the 
number  of  comparisons  can  become  quite  large  as  the  number  of  stimuli  Increase*.  Five  stimuli  would 
require  10  comparisons;  eight  stimuli  would  require  28  comparisons;  and  ten  would  require  45 
comparisons.  Scales  are  derived  from  the  number  of  times  a  stimulus  is  lodged  to  have  more  of  the 
relevant  characteristic  than  the  other  stimuli. 

Eyml  appearing  Intervals.  The  technique  of  equal-appearing  Intervals  has  the  subject  assign  the 
stimulus  to  one  of  several  categories  depending  on  how  much  of  a  characteristic  the  stimulus  is  judged  to 
possess.  Eleven  categories  are  often  used.  The  subjects  are  also  instructed  to  keep  the  distance 
between  any  two  categories  equal  to  the  distance  between  any  other  two.  Hicks  and  Wierwiiie  (1979) 
applied  thin  technique  in  a  study  of  workload  in  an  automobile  simulator.  Results  indicated  significant 
differences  between  all  task  difficulty  levels  which  indicate  the  method  Is  sensitive  to  workload  variations. 
Masline  (1986)  found  the  sensitivity  of  the  equal-interval  technique  to  be  equivalent  to  magnitude 
estimation  and  to  SWAT.  He  concluded  that  the  equal-interval  scaling  was  the  easiest  of  the  three  to 
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administer.  Masline  cautions  that  there  la  a  strong  tendency  for  operators  to  assign  stimuli  so  that  all 
category s  are  used  about  squally  often,  which  may  bias  ratings. 

Summary.  The  psychometric  techniques  of  mr  gnltudo  estimation,  paired  comparisons,  and 
equal- appearing  intervals  have  been  used  as  workload  as  lessment  techniques.  In  general,  studies  have 
indicated  sensitivity  of  the  msthods  to  varying  task  dHffcu  ty  levels.  The  psycftomethc  Sechnkfuve  appear 
to  offer  viable  alternatives  for  subjective  workload  assessment  although  reservations  about  the  use  of 
these  techniques  in  operational  enfvtronmerwa  were  expressed  by  O'Donnell  and  Eggemeier  (1985). 
More  information  on  the  development  and  procedures  tor  using  these  techniques  is  required  before  they 
are  fully  recommended  for  Army  applications.  For  further  Information,  the  reader  should  consul*  texts  on 
psychophysical  methods  (e.g.,  Edwards,  1957;  Stevens,  1975;  Gescheider,  1985)  ss  well  as  reviews  of 
these  techniques  as  applied  to  workload  assessment  (e.g.,  O'Donnell  &  Eggemeier,  1986). 

Subfa cffcw  Wortdoad  Astasmomant  TachMqu*  (SWAT) 

Tiie  Subjective  workload  Assessment  Technique  (SWAT)  is  a  subjective  rating  technique  developed 
by  the  U.S.  Air  Force  Armstrong  Aeromedicai  Research  Laboratory  (AAMRL)  at  Wright-Patterson  Air  Force 
Base.  It  uses  conjoiit  measurement  and  scaling  techniques  (Kranfct  &  Tversky,  1971 ;  Nygren,  1982)  to 
develop  a  rating  scale  with  interval  properties.  SWAT  uses  the  three  dimensions  ot  time  load,  mental 
effort  toad,  and  psychological  stress  toad  to  asseaa  workload.  These  were  adapted  from  the  workload 
definition  developed  by  Sheridan  and  Simpson  (1979).  For  each  of  the  three  dimensions,  there  are  three 
levels  which  are  operationally  defined.  These  are  shown  in  Table  5-3.  Tima  load  refers  to  the  relath/e 
amount  of  time  available  to  the  operator  (AAMRL,  1987)  and  the  percentage  of  time  an  operator  is  busy 
(Eggemeier,  McGhee,  &  Reid,  1983),  and  includes  elements  such  as  overlap  of  tasks  and  task 
interruption.  Marital  effort  refers  to  the  amount  of  attention  or  concentration  directed  toward  the  task, 
Independent  of  time  considerations.  Psychological!  stress  Is  the  degree  to  which  confusion,  frustration 
and/or  anxiety  is  present  and  adds  to  the  subjective  workload  of  the  operator.  Factors  that  may  increase 
stress  and  elevate  distraction  from  the  task  include  personal  factors  such  as  motivation,  fear,  fatigue  or 
environmental  factors  such  as  temperature,  noise  or  vibration  (AAMRL,  1987). 

There  are  two  distinct  steps  In  the  use  of  SWAT  The  firm  is  called  rale  development.  Twenty-seven 
cards  contain  all  posstole  combinations  of  the  three  levels  of  each  of  the  three  dimensions.  The  cards  are 
sorted  by  the  individual  operators  into  the  rank  order  that  reflects  their  perception  of  increasing  workload. 
The  SWAT  User's  Guide  (AAMRL,  1987)  suggests  that  the  27  cards  be  first  sorted  into  three  piles  of  nine 
each,  and  then  each  pile  ordered  1  through  9  representing  lowest  to  highest  workload.  The  order  of  the 
sorted  cards  are  then  processed  via  conjoint  scaling  procedures  to  develop  a  scale  with  interval 
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Table  5-3.  Ciperatlonai  definitions  of  the  three  SWAT  dimensions  (AAMRL,  1987). 


LEVELS 


DIMENSION 


m 

El 

E3 

ED 

11 

HI 


I.  TIME  LOAD 

Often  have  spare  time.  Interruptions  or  overlap  among  activities  occur 
Infrequently  or  net  et  ell. 

Ocoattionatiy  frava  spent  time.  Interruption*  or  ewartap  among  n^tivtties 

occur  frequently. 

Armoet  never  have  apare  time,  InMctuption*  or  overlap  among  activities  are 
very  frequent,  or  oreur  all  the  time. 

iL  MENTAL  EFFORT 

Very  tittle  oonsdous  mental  effort  or  concentrction  required.  Activity  is  almost 
automatic  requiring  little  or  no  attention. 

Moderate  consdoua  mental  effort  or  concentration  required.  Complexity  of 
activity  '*  moderately  high  due  to  uncertainty,  unpredictability,  or  unfamiHartty. 
Consider v'ble  attention  required. 

Extensive  r  rental  effort  and  concentration  are  neoassary.  Very  complex 
activity  raquulng  total  attention. 


III.  PSYCHOLOGICAL  STRESS 

□  Utile  confusion,  frustration  or  anxfaty  exists  and  or/,  be  easily  accommodated. 

[2]  Moderata  stress  due  to  confusion,  frustration  or  snxlaty.  Noticeably  adds 

to  workload.  Significant  oompenrallon  Is  required  to  maintain  adequate 
performance. 

m  High  to  very  Intense  stress  due  to  oonfuskm,  frustration  or  anxiety  High  to 

*"  extreme  determination  and  self-control  required. 


properties.  The  developed  numerical  scale  runs  from  0  to  100,  with  C  signifying  no  workload,  or  the 
lowest  ranked  condition  on  each  of  the  t».ree  dimensions  (usually  1,1,1),  and  100  corresponding  to  the 
maximum  workload,  or  the  highest  ranked  on  each  of  the  three  dimensions  (usually  3,3,3).  Other 
combinations  of  ratings  on  the  three  dimensions  (e.g.,  2,3,2)  would  be  assigned  a  corresponding  scale 
number  (e  g.,  75).  The  scale  value  corresponding  to  each  combination  of  rating  will  be  different  for  each 
individual  dependent  on  the  way  the  cards  are  sorted.  An  Illustration  of  the  mapping  from  a  three 
dimensional  to  a  unidimensional  scale  is  shown  in  Figure  1-7. 
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I.HIMWIII 


Figure  £-7.  Subjective  Workload  Assessment  Technique  (SWAT)  uses  conjoint  analysis  to  change 
each  individual's  ranks  to  a  unique  interval  scale.  In  this  individual  example,  *he  rank  of  1,  given  to  the 
combination  1 ,1 ,1 ,  is  reflected  as  the  lowest  (0)  value  on  the  Workload  Scale  The  rank  of  27,  given  to 
the  combination  3,3,3,  is  reflected  as  the  highest  (100)  vrlue  on  the  Workload  Scale.  Intermediate  rank 
values  of  3  and  18  given  tc  the  combinations  2,1,1  and  2,3,2  respectively,  are  reflected  as  intermediate 
workload  values  (i.e.,  20  and  75,  respectively)  dependent  on  the  individual  workload  scale  developed. 
(The  illustration  is  adapted  from  Gidcumb,  1985.) 


The  second  step  to  SWAT  is  the  event  scoring,  that  is,  the  actual  rating  of  workload  for  a  given  task  or 
mission  segment.  For  the  defined  task  or  segment,  the  operator  Is  asked  to  assign  a  level  (1,  2  or  3)  to 
each  of  the  three  dimensions  ot  time  load,  mental  effort,  and  psychological  stress.  It  has  been  found  that 
the  order  in  which  the  three  dimensions  are  presented  does  not  affect  the  rankings  (Acton  &  ColJe,  1984), 
but  it  is  suggested  that  the  order  in  which  they  are  ranked  bo  kept  constant  to  reduce  confusion  (AAMRL, 
1987).  This  rating  is  converted  to  one  of  the  27  numerical  scores  (described  above)  between  0  and  100 
which  are  computed  during  scale  development. 
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Since  the  initial  development  of  SWAT,  there  have  been  many  refinements,  suggestions  for 
implementation,  analysis,  and  interpretation.  One  Issue  of  concern  is  the  difference  between  individual 
ana  group  scale  development.  (The  group  scale  Is  constructed  using  group  mean  rankings.)  in  an  early 
SWAT  study,  there  ware  high  coefficients  of  concordance  for  the  rankings  of  four  different  groups  ot 
operators  tanging  between  .76  and  .62  (Reid,  Shingledecker,  &  Eggemeier,  1981).  Because  of  the  high 
level  of  agreement,  a  group  scale  was  developed  for  each  different  experiment.  With  group  scales,  the 
idiosyncrasies  of  individual  sorts  tend  to  average  out.  Conversely,  the  group  scale  may  also  hide  some  of 
the  individual  differences  in  the  perception  of  workload.  An  alternative  approach  has  been  developed 
that  permits  scale  development  for  homogeneous  subgroupings  of  individuals  called  prototypes  (Reid, 
Eggemeier,  &  Nygrsn,  1982).  The  prototypes  are  based  upon  which  one  of  the  three  dimensions  is  the 
overriding  factor  In  their  rankings.  For  example,  if  time  load  is  considered  as  most  important,  the  rankings 
may  reflect  a  certain  level  of  time  held  constant  while  the  other  two  dimensions  are  varied  across  the  full 
range  of  possibilities.  The  SWAT  User's  Guide  (AAMRL,  1987)  discusses  specific  procedures  and 
approaches  to  use  in  determining  how  individuals  should  be  grouped  together  into  time,  effort  or  stress 
prototypes.  The  prototype  approach  offers  an  increased  sensitivity  to  individual  differences  as  compared 
with  the  group  approach. 

Time,  effort  and  stress  may  be  individually  examined  as  workload  components  -  whether  individual, 
group,  or  prototype  sods  are  used.  How  the  particular  dimensions  are  rated  may  be  useful  in  determining 
the  specific  design  features  that  may  be  contributing  to  the  workload  perception,  if  the  time  load 
dimension  is  judged  to  be  very  high  while  the  other  two  are  not  scored  as  high,  for  example,  this  might 
suggest  that  a  design  element  in  the  time  domain  (e.g.,  data  presentation  rate  or  required  response  tin*) 
is  the  most  important  consideration  for  workload  In  that  task  or  mission  segment  (Eggemeier,  McGhee,  & 
Reid,  1983). 

SWAT  meets  many  of  the  practical  considerations  for  use  of  workload  assessment  techniques.  As  with 
other  subjective  techniques,  such  considerations  include  easo  of  Implementation,  high  face  validity, 
operator  acceptance,  relative  freedom  from  interference  with  the  primary  task  (i.e.,  intrusiveness), 
scorabiiity  (i.e.,  the  degree  to  which  It  can  be  quantified),  repeatability  and  quickness  of  administration 
(Crabtree,  Bateman,  &  Acton,  1984;  Courtright  &  Kuperman,  1984). 

There  have  been  numerous  studies  of  SWAT  as  a  workload  measurement  technique  in  both  laboratory 
and  applied  settings.  Laboratory  studies  have  shown  that  SWAT  is  sensitive  to  differences  in  task 
demands  in  critical  tracking  and  simulated  aircrew  radio  communication  tasks  (Reid  et  at.,  1981); 
continuous  recall  tasks  (Potter  &  Acton,  1985);  a  spatial  memory  task  (Eggemeier  &  Stadler,  1984);  a 
short-term  memory  task  (Eggemeier,  Crabtree,  Zirtgg,  Reid,  &  Shingledecker,  1982);  simulated  air-to-air 


combat  (Reid,  Eggemeier,  &  Shlngledecker,  1983,  chad  in  Eggemeier.  McGhee,  &  Reid,  19B3);  and  a 
probability  monitoring  task  (Notestfne,  1984). 

Most  of  the  applied  studies  have  used  SWAT  In  aviation  applications.  This  is  certainly  not  surprising 
given  the  Air  Force  roots  of  SWAT  and  the  traditional  concern  with  pilot  workload.  Skelly  and  Purvis 
(1985)  used  SWAT  in  an  investigation  of  a  B-52  wartime  mission  simulation.  Haworth,  Bivens  and  Shively 
(1986)  used  SWAT  in  a  single  pilot  helicopter,  nap-of-the-earth  flight  simulation.  Gidcumb  (1985)  reports 
the  use  of  SWAT  in  several  Air  Force  applications.  Courtright  and  Kuperman  (1985)  discuss  the  use  of 
SWAT  in  Air  Force  test  and  evaluation  environments  and  found  the  technique  understandable  and 
accepted  by  both  testers  and  subjects.  Schick  and  Hann  (1987)  used  a  German-language  version  of 
SWAT  to  assess  workload  in  a  moving-base  cockpit  simulator.  They  report  that  SWAT  was  sensitive  to 
varied  task  difficulty. 

However,  application  of  SWAT  has  not  been  limited  to  aviation  environments.  Crabtree,  Bateman  and 
Acton  (1984)  used  SWAT  in  an  examination  of  over  20  command,  control  and  communication  (C3)  tasks 
(the  tasks  are  not  described  in  detail).  SWAT  ratings  were  also  obtained  in  a  study  of  the  effects  of 
experience  level  on  the  performance  of  nuclear  power  control  room  craws  (Beare  &  Dorris,  1934). 

The  reliability  of  the  SWAT  card  sorts  has  been  typically  found  to  be  high:  the  correlation  ranged  from 
.77  to  1 .00  for  four  pilots  for  pre-  and  post-test  card  sorts  (Gidcumb,  1985).  Subjects  have  produced  card 
sorts  as  far  apart  as  a  year,  and  over  eighty  percent  of  the  subjects  produced  sorts  that  correlated  .90  or 
above  (AAMRL,  1987).  These  correlations  suggest  stabie  workload  judgments  will  be  made  across  time. 

With  SWAT,  as  with  other  subjective  techniques,  there  is  a  question  regarding  the  effects  of  delays 
between  the  workload  experience  and  the  rating.  Some  research  has  specifically  looked  at  this  question 
and  concluded  that  although  there  were  some  changes  in  ratings,  short  delays  of  15-30  minutes  do  not 
affect  the  overall  mean  ratings  (Eggemeier,  Crabtree,  &  LaPointe,  1983).  This  may  have  been  due  to  a 
counterbaUincing  effect  where  some  subjects  Increased  rating  and  others  decreased  ratings  relative  to 
the  baseline.  Eggemeier,  Melville  and  Crabtree  (1984)  found  thrt  neither  14-minute  delays  nor 
intervening  tasks  affected  subjective  workload  ratings.  However,  delayed  ratings  should  not  be  expected 
to  be  exactly  the  same  as  ratings  given  immediately  after  performance.  This  is  particularly  important  if  the 
absolute  value  is  desired,  but  not  as  important  if  relative  values  are  desired  for  comparison  between  two 
alternative  task  or  equipment  configurations.  Additionally,  it  was  found  that  the  most  difficult  intervening 
task  produced  the  most  discrepant  ratings.  This  finding  is  troubling  because  of  the  analogy  that  can  be 
drawn  to  applied  studies.  Often  the  reason  for  operators  not  providing  ratings  when  asked  is  that  they  are 
too  busy  with  a  difficult,  high  workload  task.  At  the  next  occasion  for  rating,  a  difficult  intervening  task  will 
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have  occurred,  therefore,  the  research  suggests  that  the  rating  will  be  lower  titan  It  would  have  been 
without  the  delay  caused  by  the  difficult  task  (Eggemeler,  Melville,  &  Crabtree,  1984). 

SWAT  appears  to  be  a  valid  measure  of  some  aspects  of  workload,  particularly  those  associated  with 
the  operationally  defined  dimensions  of  time  load,  mental  effort  and  psychological  stress.  SWAT  has 
been  found  to  give  similar  results  to  other  subjective  methods  such  as  NASA  bipolar  ratings  (Vidulich  & 
Tsang,  1986;  Haworth  et  al.,  1986);  the  Modified  Cooper-Harper  scale  (Warr  et  al.,  1986);  magnitude 
estimation  and  equal-lnteiva!  scales  (Masline,  1986);  as  well  as  to  compare  favorably  with  various 
physiological  measures  (Albery,  Repperger,  Reid,  Goodyear,  &  Roe,  1987). 

Several  practical  observations  and  suggestions  were  made  by  Gidcumb  (1985)  in  a  report  that  used 
SWAT  as  a  workload  measure  in  several  Air  Force  applications.  Ho  concluded  that  "SWAT  appears  to  be 
an  accurate  measure  of  the  workload  experienced  hy  the  aircrew  participating  in  the  tests  surveyed"  (p.  V- 
1).  However,  several  suggestions  were  made  to  improve  the  us  >  of  SWAT  in  applied  settings.  During  the 
introductory  briefing  to  SWAT,  more  emphasis  should  be  placed  on  what  will  be  expected  ot  the 
operators.  There  were  observations  that  some  of  the  operators  approached  the  card  sorting  task  very 
casually,  as  evidenced  by  cursory  card  sorts.  Gidcumb  suggests  that  operators  be  fully  introduced  to  the 
benefits  of  SWAT  to  them  personally  and  the  importance  of  the  card  sort  to  the  entire  procedure.  The 
motivation  of  the  operators  is  a  critical  element  in  the  success  o I  the  card  sort. 

SWAT  administrators  agreed  that  the  operators  should  be  thoroughly  familiar  with  the  rating 
procedures,  and  after  six  to  ten  SWAT  ratings  aircrew  felt  confident  that  the  ratings  were  reflecting  their 
workload  perceptions.  After  15  ratings,  the  aircrew  reported  the  ratings  interfered  little  with  their  other 
duties.  Practice  with  the  rating  procedure  and  the  operational  definitions  of  the  dimension  levels  is  very 
important  in  obtaining  accurate  workload  measures.  Without  adequate  practice,  a  learning  effect  may 
distort  the  ratings  and  both  relative  comparisons  and  absolute  measures  of  workload  will  have  c  fy  limited 
value. 

There  were  other  comments  dealing  with  the  gathering  cf  ratings.  Some  pilots  refused  to  consider 
real-time  ratings  because  of  their  concern  that  it  would  impose  an  additional  task.  An  alternative  procedure 
was  used  where  the  pilots  would  review  mission  videotapes  for  post-flight  ratings.  Another  way  to  handle 
missing  ratings  would  be  to  assign  the  highest  rating  (3,3,3)  for  real-time  segments  that  were  missed 
(AAMRL,  1987).  The  operators  also  had  trouble  deciding  what  to  rate  segments  that  were  impossible  to 
perform  or  were  performed  Incorrectly.  The  suggestion  was  made  that  the  SWAT  administrator  needs  to 
be  explicit  about  what  to  rate  and  what  kind  of  ratings  should  be  assigned  impossible  or  differently 
performed  tasks. 
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Schick  and  Hann  (1987)  suggest  that  SWAT  data  collection  be  planned  carefully  so  that  obtaining 
ratings  does  not  Interfere  with  flight  duties.  Therefore,  event-related  data  collection  (as  has  been  used  In 
most  studies)  appears  to  be  a  better  alternative  than  data  collection  at  fixed  time  Intervals.  Interestingly, 
this  Is  different  from  GWcumb’s  (1985)  recommendation  that  further  research  should  be  done  on  time- 
based  rather  than  task-based  rating  segments. 

Another  observation  was  that  In  these  applications  (and,  It  can  be  inferred,  most  operational 
applications),  only  a  small  number  of  operators  and  data  gathering  missions  are  available,  therefore  sample 
sizes  are  small.  Parametric  statistical  analyses  may  be  inappropriate  and  other  descriptive  or  comparison 
techniques  may  be  more  appropriate  in  such  cases. 

Other  issues  involve  the  expansion  of  the  current  rating  scheme  from  the  three  current  levels  to, 
perhaps,  five.  Although  this  might  provide  greater  rating  sensitivity  and  avoid  floor  or  ceiling  effects  (as 
suggested  by  Potter  &  Acton,  1985),  the  card  sort  (as  currently  administered)  with  five  levels  might 
become  unmanageable  for  subjects.  However,  finding  some  approach  to  increase  the  number  ot  levels 
may  yield  benefits.  The  use  of  partial  sods,  consisting  of  a  subset  of  the  original  number  of  combinations, 
may  be  a  possible  method,  although  this  has  not  as  yet  bean  thoroughly  developed  (Nygren,  1985). 

There  is  also  a  question  of  the  ability  heeded  to  perform  the  card  sort  procedure,  it  is  recognized  that 
the  card  sort  is  the  key  to  successful  use  of  SWAT  and  that  motivation  does  play  a  role  in  how  carefully  the 
cards  are  sorted.  The  cards  contain  combinations  of  verbal  descriptions  and  there  is  some  anecdotal 
evidence  to  suggest  that  individuals  with  low  verbal  skills  may  have  difficulty  in  the  sorting  task.  A  possible 
solution  would  be  the  use  of  graphical  representations.  This  is  an  area  for  further  investigation  -  empirical 
data  are  needed  to  examine  this  issue  to  see  if  it  is  a  problem.  Solutions  will  be  proposed  and  investigated 
it  this  is  proven  to  be  a  problem  (G.  B.  Reid,  personal  communication,  July  9  1987). 

Several  other  concerns  have  been  raisod  in  addition  to  the  potential  problems  associated  with  the  card 
sorting  procedure.  It  has  been  suggested  (Derrick,  1983;  Hart,  1986a)  that  three  factors  may  not  be 
enough  to  adequately  characterize  workload.  The  three  factors,  it  has  been  suggested,  may  not  be 
orthogonal  (Boyd,  1983).  Hart  (1986a)  discusses  that  the  assumption  that  people  can  accurately 
distinguish  between  the  27  combinations  may  not  be  true.  Boyd  (1983)  suggests  that  there  might  be 
high  interrater  reliabilities  at  the  extit>«>>es,  bit  the  intermediate  ratings  may  be  iess  reliable.  A  further 
concern  is  that  scales  with  fewer  than  six  or  seven  divisions  may  have  response  noniinearities  near  the 
endpoints  (Hart  &  Stave  land  1987). 

Summary.  The  Subjective  Workload  Assessment  Technique  (SWAT)  uses  conjoint  analysis  to 
obtain  a  workload  rating  scale  with  interval  properties  SWAT  uses  the  three  dimensions  of  perceived  time 
•oad,  mental  effort,  and  psychological  stress  to  assess  OWL.  Both  scale  development  and  an  event 
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scoring  procedures  are  used.  These  provide  individual  rank  order  ot  dimension  and  ratings  on  the  three 
dimensions  tor  a  given  task  or  task  segment.  SWAT  has  been  shown  to  be  both  valid  and  reliable  as  a 
measure  of  workload.  SWAT  has  been  used  in  both  laboratory  and  applied  settings  and  found  to  be 
sensitive  to  a  variety  ot  task  demands.  Because  of  the  mu  it 'dimensional  nature  ot  SWAT,  it  is  possible  to 
use  the  individual  dimonsion  scales  as  diagnostic  OWL  tools.  Care  must  be  taken  in  the  card  son  and 
event-scoring  implementation  to  obtain  accurate  workload  measures.  SWAT  appears  to  be  a  useful 
technique  for  subjective  workload  assessment  In  As  my  applications. 

Other  Subjective  Rating  Scales 

There  are  other  rating  scales  ’hat  have  been  developed  for  workload  assessment.  Often  scales  are 
created  to:  specific  studies.  This  has  led  Shingledecker  (1983,  as  cited  in  Potter,  1986)  to  suggest  that 
there  may  be  almost  as  many  scales  and  checklists  as  there  are  studies  that  "se  subjective  assessment 
techniques.  Of  these  many  subjective  techniques,  a  few  are  presented  here  as  examples  of  other  typos 
of  rating  scales  that  have  been  developed  and  used  In  workload  assessment  applications. 

The  Pilot  Subjective  Evaluation.  The  Pilot  Subjective  Evaluation  (PSE)  process  was  developed 
by  Boeing  for  use  in  the  workload  evaluation  of  the  Boeing  767  (Fadden,  1982;  Ruggiero  &  Fadden, 
1987).  The  PSE  is  shown  in  Figure  5-8.  It  includes  both  seven-point  rating  scaies  and  an  accompanying 
questionnaire.  A  validation  study  of  the  PSE  is  reported  although  details  are  not  given  in  either  of  these 
two  papers.  The  particularly  interesting  aspect  of  these  scales  is  the  use  of  a  reference  airplane  (chosen 
by  the  pilot)  for  a  comparative  evaluation  of  workload.  Basically,  the  pilots  rated  whether  operation  of  the 
767  was  more,  th8  same,  or  less  demanding  than  than  the  reference  aircraft  in  terms  of  mental  effort, 
physical  difficulty,  and  time  required.  Ratings  of  greater  workload  indicate  areas  for  design  improvements. 
An  interview,  held  at  the  end  of  each  day,  provided  the  opportunity  to  gather  more  information  on  the 
items  rated  worse  than  the  reference  airplane  workload. 

The  Dynamic  Workload  Scale.  The  Dynamic  Workload  Scale  is  another  rating  scale  developed  for 
an  aircraft  certification  program  and  has  been  used  by  Airbus  Industrie  (Speyer,  Fori,  Fouillot,  &  Blomberg, 
1937).  As  seen  in  Table  5-4,  the  scale  is  a  seven-point  scale.  The  technique  Includes  workload 
assessment  by  both  the  pilot  and  an  observer-pilot.  The  scale  is  administered  without  defining  workload, 
allowing  the  pilot  and  observer  to  be  g  .tided  by  their  own  interpretation  of  workload.  However,  the  criteria 
for  the  raters  to  consider  are  reserve  capacity,  interruptions  and  effort  or  stress.  The  observer  makes  a 
rating  whenever  the  workload  has  changed  since  the  last  rating  or  when  five  or  more  minutes  have 
passed.  A  cue  is  then  given  to  the  pilot  to  make  a  rating.  The  primary  analyses  of  these  data  were 
cumulate  rating  distributions.  Concordance  between  pitot  and  observer  ratings  were  ?>so  examined  and 
appeared  high.  Ratings  are  also  plotted  along  a  timeline.  Speyer  et  al.  (1987)  report  a  shift  in  the  median 
of  the  distribution  of  ratings  as  workload  increased,  implying  a  sensitive  measure,  although  no  further 
details  are  available. 
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Pilot  Subjective  Evaluation  Scale 
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Rgura  5-8.  The  Pilot  Subjective  Evaluation  Scale  Developed  by  Boeing. 


Table  5-4.  The  Dynamic  Workload  scale  used  by  AJtbus  Industrie  tor  aircraft  certification  (Speyer  et  al., 
1987). 


Workload 

Assessment 

Criteria 

Appreciation 

Reserve  Capacity 

interruptions 

Effort  or  Stress 

Light  2 

Ample 

Very 

Acceptable 

Moderate  3 

Adequate 

Some 

Weil 

Acceptable 

Sufficient 

Recurring 

Not  Undue 

Acceptable 

— 

High  5 

Reduced 

Repetitive 

Marked 

High  but 
Acceptable 

Heavy  6 

Little 

Frequent 

Significant 

Just 

Acceptable 

Extreme  V 

None 

Continuous 

Acute 

Not 

Acceptable 

Continuously 

Supreme  8 

Impairment 

Impairment 

impairment 

Not 

Acceptable 

Instantaneously 

Analytic  Hierarchy  Process.  A  scaling  procedure  which  uses  paired  comparisons  is  based  on  the 
Analytic  Hierarchy  Process  (AMP)  developed  by  Saaty  (Udderdale,  1987).  The  procedure  was  aimed  at 
obtaining  relative  estimates  of  workload  after  flights.  All  possible  pairs  of  tasks  or  task  segments  are 
presented  to  the  operator  (in  this  case,  the  pilot).  If  one  of  the  pair  is  judged  to  have  higher  woikload.  the 
operator  is  asked  to  judge  by  how  much  on  a  scale  from  1  to  5: 

1  -  equal  workload 

2  -  slightly  higher  workload 

3  *  moderately  higher  workload 

4  *  very  much  higher  workload 

5  «  extremely  high  relative  workload. 

Through  mathematical  procedures  (Udderdale,  1987;  Udderdale  &  King,  1985;  Saaty,  1980),  the  ratings 
can  be  used  to  obtain  relative  judgments  of  mission  element  workload.  Visual  inspection  of  graphs  of 
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workload  assessments  of  the  same  mission  elements  obtained  by  the  Bedford  scale  and  the  AHP  method 
show  similar  results  and  a  rank  order  analysis  gave  high  correlations  (LkJderdale,  1987). 

Vidulich  and  Tseng  (1987)  classify  the  AHP  as  a  relative  )udgment  method  as  opposed  to  the  absolute 
judgments  of  workload  that  are  obtained  with  NASA-TLX  or  a  unidimensional  workload  scale.  All  three 
OWL  scales  were  used  in  a  single-axis  compensatory  tracking  task  with  control  order  determining  the  level 
of  task  difficulty  and  visual  or  auditory  presentation.  All  three  OWL  measures  exhtolted  chsc  agreement  in 
discriminating  the  task  variables,  although  the  AHP  showed  the  greatest  validity  (as  measured  by 
correspondence  to  performance)  and  reliability  (as  measured  by  test-retest  correlations).  However,  topics 
of  concern  include  fww  well  relative  judgments  could  be  made  across  more  varied  tasks,  as  well  as  the 
possibility  of  subjects  forgetting  details  or  creating  their  own  hypotheses  about  task  relationships 
Vidulich  and  Tsang  suggest  that  further  research  with  the  AHP  should  be  pursued. 

Worktoad/Campunaation/interiaranca/TdChnical  Effactlvtnase,  The  Mission  Operability 
Assessment  Technique  (MOAT)  is  another  technique  that  uses  conjoint  scaling  methods  (Donnell,  1979; 
Helm  &  Donnell,  1979).  The  MOAT  process  was  designed  to  evaluate  overall  system  operability, 
specifically  in  aviation  environments.  As  part  of  the  MOAT  process,  the  Workload/  Compensation/ 
Interference/  Technical  Effectiveness  (WCl/TE)  matrix  and  rating  scale  was  developed  The  WCl/TE  is  a  4 
X  4  matrix  which  describes  technical  effectiveness  of  the  system  (4  levels)  and  pilot  workload, 
compensation  and  interference  (4  levels)  and  is  shown  in  Figure  5-9.  As  in  all  conjoint  scaling  techniques, 
pilots  first  rank  order  the  16  matrix  elements  and  then  specific  tasks  are  rated.  The  task  rating  can  then  be 
transformed  to  an  interval  value  from  0  to  100. 

Some  data  on  the  sensitivity  of  the  WCl/TE  are  available  from  work  done  by  Wierwille  and  Connor 
(1983).  The  study  used  psychomotor  tasks  in  a  moving-based  flight  task  simulator.  The  WCl/TE  scale  was 
found  to  significantly  differentiate  between  three  levels  of  task  difficulty.  Wierwille  et  al.  (1985)  also  report 
the  WCl/TE  to  be  generally  sensitive  to  psychomotor,  perceptual,  and  mediations  tasks  (the  WCl/TE  was 
not  tested  with  communication  tasks).  O'Donnell  and  Eggemeier  (1986)  suggest  that  MOAT  was 
specifically  intended  tor  piloting  tasks  and  was  not  intended  as  a  direct  measure  of  workload. 

Summary.  These  techniques  represent  several  additional  subjective  workload  assessment  tools. 
The  PSE  and  the  Dynamic  Workload  Scale  were  developed  specifically  for  civilian  aircraft  certification  and 
provide  interesting  examples  of  applied  techniques.  The  WCl/TE  scale  has  been  found  to  be  a  sensitive 
workload  measure  (Wierwille  et  al.,  1985),  but  currently  appears  to  be  of  interest  only  as  the  conjoint 
scaling  predecessor  to  SWAT. 
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Multiple  Tasks 
Integrated 


Design  Enhances 
Specific  Task 
Accomplishment 


Adequate  Per¬ 
formance  Achlovable; 
Design  Sufficient 
to  Specific  Task 

Inadequate  Per¬ 
formance  Due  to 
Tschnical  Design 


Workload  Workload  High;  Workload  Workload  Low; 
Extreme;  Compensation  Moderate;  Compensation 

Compensation  High;  Compensation  Low; 
Extreme;  Interference  Moderate;  Interference 
Interference  High  Interference  Low 
Extreme  Moderate 

WORKLOAD  /  COMPENSATION  /  INTERFERENCE 


Figure  5-9.  The  WCI/TE  scale  matrix  (Donnell.  1979;  Helm  &  Donnell,  1979). 

The  AHP  is  a  technique  that  has  recently  been  used  for  workload  assessment.  Lidderdale  (1S87) 
found  it  useful  in  cn  applied  setting,  while  VkSulich  and  Tsang  (1387)  round  It  more  reliable  and  valid  than 
two  other  scales  in  a  sir-gle  and  dual-task  laboratory  experiment.  Sufficient  Information  Is  not  yet  available 
to  make  judgments  on  the  AHP  for  Army  OWL  assessment.  Further  research  is  needed. 

Compartwng  Among  FtOng  Satim 

The  results  of  comparisons  among  different  rating  scales  nave  been  briefly  described  in  previous 
sections.  Thera  have  been  several  additional  studies  that  have  directly  compared  more  than  one 
subjective  measure  of  workload  Son.^  have  used  ;nuttiple  measures  as  a  battery  ol  workload  assessment 
tools  -  others  have  performed  research  intended  as  comparisons  and  validation  studies  ot  the  various 
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techniques  Table  5-5  presents  a  matrix  of  the  subjective  OWL  techniques  that  have  been  discussed  In 
the  previous  sections.  Within  the  matrix,  published  research  that  has  used  more  than  one  subjective 
technique  is  8 sled.  Although  it  Is  believed  that  the  primary  comparative  studies  are  Sited,  some  research 
that  could  have  been  included  in  this  table  may  not  have  been  identified.  However,  the  table  is 
representative  of  the  research  that  has  been  done  and  the  gaps  that  stiR  exist. 

The  llerature  altogether  indicates  that  the  techniques  that  have  been  compared  correspond  well. 
Generally,  the  same  rank  order  of  task  <Stfi  cutty  are  obtained  by  each  technique.  Each  of  the  ctucias  I  tied 
in  Table  5-5  is  briefly  presented: 

*  Hart  and  Staveland  (1987)  describe  the  development  of  NASA-TLX  as  a  refined 
edition  of  the  NASA-bipolar  scales.  NASA-TLX  was  developed  to  reduce  the  number 
of  scales  (from  ten  to  six)  by  selecting  those  dimensions  that  best  discriminated 
between  task  variables,  that  provided  independent  Information,  and  were  associated 
with  overall  workload  ratings.  NASA-TLX  and  bipolar  scales  were  not  compared  as 
such,  but  the  relationship  between  the  scales  and  supporting  empirical  data  are 
presented  in  detail. 

*  Hawort Bivens  and  Shively  (1986)  investigated  workload  In  single-pilot  operation  in 
NOE  helicopter  missions.  They  used  the  Cooper-Harper  scale,  the  NASA-Bipolar 
scales  and  SWAT  to  assess  handling  qualities  (using  CH)  and  workload.  The 
correlation  between  the  Bipolar  and  SWAT  was  .67,  while  CH  was  significantly 
correlated  to  NASA-Bipolar  and  SWAT  measures  (.75  and  .79,  respectively).  Both 
subjective  techniques  indicated  a  higher  average  workload  tor  one  pilot  as  compared 
to  two  pilots. 

*  lidderdale  (1987)  used  the  Bedlord  scale  to  obtain  real-time  OWL  ratings  tor  an 
advanced  combat  aircraft  with  a  two-person  crew  during  low  level  maneuvers  The 
Saaty  AHP  was  used  to  obtain  OWL  ratings  in  a  post-flight  oontext.  Visual  inspection 
of  graphs  of  workload  ratings  for  each  (fight  segment  by  both  techniques  show  simitar 
results.  The  Spearman  rank  order  correlation  between  the  Bedford  and  AHP  scores 
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Table  5-5.  Workload  studies  that  have  used  more  than  one  rsttng 
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#how»  significant  correlation  coefficients  of  .66  for  the  pilots  and  .85  for  trie 
navigators.  The  AHP  obtains  a  relative,  post-fight  aartssmont,  while  the  SucKord 
scale  is  consid#f*d  an  absolute  seal*.  The  author  suggests,  however,  that  the 
Bedford  scale  may  tx  considered  retattve  In  that  wortriom)  assessments  are  probity 
made  from  a  basefne  of  a!  previous  experience. 

•  Masttno  (1968)  used  8WAT,  magnitude  estimation,  and  equal-appearing  Interval 
scales  to  assess  workload  of  a  continuous  mean  task  where  presentation  rate,  number 
of  digits  and  the  number  of  positions  back  to  recal  were  varied.  Results  indicated 
equal  sensitivity  among  Pie  three  techniques.  Correlations  between  subjective  and 
performance  measures  were  significant.  MasUne  compared  the  three  techniques  with 
other  criteria;  all  three  appeared  equivalent  In  terms  of  sensitivity,  predictive 
capability,  obtrusivenets  and  operator  acceptance.  However,  SWAT  appeared  to 
have  greater  diagncirt  Jetty  because  of  As  multidimensional  nature.  The  easiest 
technique  to  admintslsr  was  the  equal-interval  scale. 

-  Tseng  and  Johnson  (1967)  used  a  battery  of  three  subjective  measures  to  assess 
workload  hi  several  manual  and  semi-automated  tasks.  The  MiASA-TLX  scale,  the 
Bedford  scale,  and  a  unkJfrnentfonsI  overt B  workload  scale  were  used.  The  NASA- 
TLX  and  overall  workload  scctai  dispiayed  very  similar  trends  far  the  different  tasks. 
Interestingly,  tt«#  operator  workload  ratings  showed  a  training  ellect  evidenced  by  a 
decrease  In  ratings  in  latar  sessions  (i.e.,  over  three  sessions).  The  authors  suggest 
these  findings  demonstrate  the  sensitivity  and  robustness  of  these  measures. 

The  slightly  different  ratings  obtained  from  the  Bedford  scale  were  interpreted,  In  light 
of  multiple-resource  theory  (Wickers,  1980),  as  supportive  ot  the  ability  of  the 
Bedford  scales  to  assess  what  K  dam  to  assess,  that  is,  spare  capacity.  The  authors 
do  caution  that  these  oovtdueiorts  are  based  on  limited  data  from  only  six  subjects. 

•  Vidulich  and  Ts«ng  (1986)  (see  also  Vidulich  &  Tseng,  1985a  A  1985b)  used  both 
NASA  bipolar  scales  and  SWAT  ratings  to  assess  workload  in  a  laboratory  study  using 
tracking  and  spatial  transformation  tasks.  Both  techniques  displayed  sensitivity  to  the 
various  task  demands  and,  In  general,  provide  similar  results.  Haworth  et  al.  (1986) 
also  found  a  significant  correlation  between  the  techniques  (r-  .67),  but  R  is  not  as 
high  as  that  found  by  Vidulich  and  Tsang  (r-  .76).  However,  specific  differences 
were  found.  A  major  difference  was  that  the  bytween-subject  variability  was 
consistently  lower  for  tho  NASA  bipolar  ratings  than  for  SWAT.  A  was  suggested  that 
even  with  the  high  level  of  concordance  between  subjects'  rank  orderings,  the  SWAT 
scale  development  still  represents  a  group  average.  For  the  bipolar  scales,  however, 
the  weighting  procedure  Individualizes  the  workload  score. 

The  relative  aasa  of  use  of  NASA  bipolar  and  SWAT  ware  also  compared.  SWAT  can 
be  used  in  real-time  data  collection  as  A  only  requires  choosing  one  of  three  levels  for 
the  three  dimensions.  The  NASA  bipolar  scales  require  a  break  in  performance  to 
collect  the  ratings.  However,  the  SWAT  card  sorting  procedure  takes  at  least  20 
minutes  and  may  take  as  long  as  one  hour.  A  was  suggested  that  the  NASA  workload 
parameter  comparisons  ware  easier  to  perform  and  require  about  10  minutes  to 
complete  the  36  paired  comparisons. 

Neither  technique  was  able  to  detect  resource  competition  effects  in  dual-task 
situations,  in  response  execution  processing  demands,  or  in  the  dynamics  of 
diffieuAy  changes.  A  was  not  certain  A  this  resulted  from  Inherent  limitations  in 
subjective  methods  or  in  the  limitations  of  these  two  techniques  in  particular. 
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•  VHuIlch  and  Tsang  (1987)  classify  the  AHP  as  a  relative  judgment  method  ns 
opposed  to  the  absolute  judgments  of  workload  that  aro  obtained  with  NASA-1  LX  or 
a  unldimensionai  workload  scale.  Alt  three  OWL  scales  were  used  In  a  single-axis 
compensatory  tracking  task  with  control  order  determining  the  levs!  of  task  difficulty 
and  visual  or  auditory  presentation.  All  three  OWL  measures  exhibited  ctose 
agreement  in  discriminating  the  task  variables,  although  the  AHP  showed  the 
greatest  validity  (as  measured  by  correspondence  to  performance)  and  reliability  (as 
measured  by  test-retest  correlations).  However,  topics  of  concern  Include  how  well 
relative  judgments  could  be  made  across  more  varied  tasks,  as  well  as  the  possibility 
of  operators  forgetting  details  or  creating  and  reporting  their  own  hypotheses  about 
task  relationships.  Vidulich  and  Tsang  suggest  that  the  AHP  appears  promising  and 
further  research  should  be  pursued. 

»  Wanr,  Colls  and  Reid  (1986)  used  both  SWAT  and  Modified  CH  to  obtain  workload 
ratings  in  a  laboratory  setting  for  both  a  cognitive  and  a  motor  task,  each  with  three 
leyels  of  difficulty.  A  linear  transformat  Ion  of  SWAT  scores  was  performed  to  make 
them  equivalent  to  the  Modified  CH  scores  (conventional  rounding  rules  are 
assumed)  No  statistical  evidence  was  found  that  the  scales  differed  in  sensitivity. 
However,  both  scales  were  found  It  discriminate  between  task  difficulty  levels.  The 
authors  point  out  that,  although  the  scales  were  found  to  be  equally  sensitive  to  the 
task  manipulations,  the  SWAT  subscales  might  provide  mote  diagnostic  information  in 
an  applied  setting. 

•  Wierwiilo,  Casali,  Connor  and  Rahimi  (1985)  describe  a  study  in  which  14  workload 
measures  including  two  rating  scales  were  evaluated  using  perceptual  tasks  in  a 
moving-based  flight  simulator.  The  perceptual  tasks  involved  seeing  warning  lights 
on  the  instrument  control  panel  and  responding  via  a  pushbutton.  Both  the  Modified 
CH  and  the  WCI/TE  rating  scales  were  used  to  obtain  workload  ratings.  Both  rating 
scales  showed  a  monotonic  increase  in  ratings  as  the  task  difficulty  Increased  across 
three  levels  The  scales  differentiated  between  high  and  the  other  two  levels  of  task 
difficulty.  Little  difference  was  found  in  the  ability  of  the  two  scales  to  reflect  changes 
in  workload. 

Wierwille  et  al.  (1985)  also  report  a  similar  experiment  using  mediations!  tasks 
comprised  of  finding  geometric  and  mathematical  solutions  to  various  navigation 
problems.  Once  found,  the  solutions  were  not  Implemented.  The  Modified  CH  and 
WCI/TE  were  used  to  obtain  OWL  ratings.  The  Modified  CH  showed  a  monotonic 
increase  in  workload  ratings  as  difficulty  Increased  while  the  WCI/TE  showed  no 
difference  between  low  and  medium  difficulty.  The  Modified  CH  would  therefore  be 
reoomnended  as  the  better  rating  scale  for  OWL  assessment  in  mediations’  tasks. 

•  Wierwille  and  Connor  (1983)  evaluated  20  workload  measures  including  two  rating 
scales  using  a  psychomotor  task  in  a  moving-based  flight  task  simulator.  Both  the 
Cooper-Harper  and  the  WCI/TE  scales  were  used.  The  results  Indicate  that  both 
rating  scales  significantly  discriminated  between  each  of  three  levels  of  task  difficulty 
The  normalized  mea  ts  of  each  difficulty  level  corresponded  exactly  in  rank  order  and 
closely  in  magnitude.  Both  scales  were  found  sensitive  to  and  are  recommended  for 
workload  measurement  for  psychomotor  tasks. 

•  Wierwille,  Skipper  and  Rieger  (1964)  conducted  fwo  studies  to  test  whether  the 
sensitivity  of  the  Modified  CH  could  be  increased  by  changing  from  a  10-point  to  a  15 
point  scale  or  bv  changing  the  format  to  computer-based  or  tabular  form.  Increasing 
the  categories  from  10  to  15  did  not  consistently  improve  sensitivity.  In  general,  they 
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ooix.ludecS  that  the  original  Modified  CH  was  the  most  consistently  sensitive  measure 
of  the  five  alternatives  tested. 

Two  observations  can  be  tnwcJa  regarding  the  comparstivft  studies  of  subjective  workload  measures. 
The  first  observation  is  that,  when  the  techniques  are  used  for  the  same  task,  in  general  the  results  are 
very  similar,  in  all  studies  using  two  or  more  different  techniques  (excluding  Wterwilie  et  al.,  1984  and  Hart 
8  StaveianrJ,  1987),  the  same  rank  order  of  difficulty  was  found  for  the  task  loadings.  It  appears  that  each 
ol  the  techniques  described,  when  carefully  planned  find  Implemented,  can  provide  useful  assessments 
of  OWL 

The  second  observation  to  that  more  comparative  work  of  this  kind  should  be  done.  Following  the 
traditional  lead  from  psychormtrtes,  It  to  believed  that  factor-analysis  and  other  structural  investigations 
would  provide  a  stronger  b am  for  comparisons  among  techniques.  Certainly,  comparisons  of  the  various 
techniques  are  required  for  systems  applications  of  Interest  to  the  Army. 

Issues  Concerning  Subjective  Rating  Techniques 


Dissociation  tatvmen  Subfaictiva  mhI  Parfomtanco  Ummms 

Subjective  workload  measurement  and  operator  performance  are  generally  highly  correlated  during  the 
early  and  middle  stages  of  overload.  Higher  subjective  ratings  of  workload  are  obtained  in  parallel  with 
worse  performance.  However,  this  pattern  is  not  ahsvays  the  one  obtained  in  OWL  assessments.  For 
example,  a  dissociation  between  performance  and  subjective  measures  may  occur  where  one  task  is 
performed  better  than  another  but  is  perceived  as  having  higher  workload  (Yeh  &  Wickens,  1984).  The 
kfea  of  dissociation  between  subjective  measures  and  performance  is  troubling  because  it  indicates  that 
opposite  conclusions  miglrt  be  drawn,  depending  on  whether  subjective  or  performance  measures  are 
used  for  evaluation. 

Although  this  continues  to  be  an  active  research  area,  several  conclusions  of  interest  to  practitioners 
have  been  drawn.  In  general,  subjective  experiences  are  more  assessable  via  introspection  and  verbal 
reports  when  they  are  in  workirrg  memory  (Ericsson  &  Simon,  1980).  Therefore,  perceptual  and  cognitive 
elements  (i.e.,  those  elements  associated  with  working  or  tong-term  memory)  will  be  more  salient  in 
subjective  reports  than  those  elements  tha’  are  associated  with  response  execution  such  as  control 
manipulation.  Yeh  and  Wickens  (1984)  ran  a  sarfes  of  experiments  to  Investigate  various  hypotheses 
regarding  dissociation.  Based  on  the  results,  they  conclude  that  the  strongest  dissociation  occurs 
between  single  task  difficulty  and  a  dual  task  combination.  When  performing  two  tasks  together, 
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Increased  cognitive  management  is  needed  for  processing  and  coordination.  Therefore,  subjective 
animates  will  be  higher  than  actual  performance  decrements  in  an  easy  dual  task  in  comparison  with  a  hard 
single  task.  Yeh  and  Wickens  also  found  numbers  of  display  elements  increased  the  subjective  workload 
experience  although  tracking  performance  was  helped  with  the  multielement  predictor  display. 

Vidulich  and  Wickens  (1986)  make  several  observations  on  the  implications  of  dissociation  between 
subjective  OWL  measures  and  performance.  One  observation  Is  that  the  usefulness  of  subjective 
measures  may  be  reduced  in  detecting  the  individual  workloads  of  single  subtasks  in  a  multitask 
environment.  Therefore,  the  authors  suggest,  subjective  OWL  measures  should  be  obtained  on 
important  subtasks  in  a  single-task  environment,  if  possible.  Otherwise,  perhaps  the  multitask 
environment  differences  should  be  weighed  more  heavily  than  those  lound  in  a  single-task  situation. 
Another  source  of  dissociation  Is  suggested  to  result  from  subjects'  logical  analysis  of  the  situation. 
Vidulich  and  Wickens.  for  example,  slowed  down  the  presentation  rate  of  stimuli,  this  disturbed  the 
subjects  response  rhythm,  and  consequently  degraded  performance.  However,  the  subjects'  perceived 
the  slower  rate,  and  logically  deduced  that  this  should  cause  less  workload,  and  based  their  ratings  on  that 
analysis.  A  third  dissociation  found  by  Vidulich  and  Wickens  is  that  associated  with  increased  motivation. 
Higher  levels  of  motivation  (induced  by  bonus  pay)  aided  performance  but  led  to  perceptions  of  higher 
workload.  The  implications  for  operational  settings  include  the  importance  of  maintaining  constant 
motivation  levels  for  different  subjects  and  tasks  during  system  evaluation.  The  authors  emphasize  the 
importance  of  subjective  measures  in  situations  where  new  or  interesting  alternatives  might  influence 
performance  and  obscure  actual  differences  in  OWL. 

Vidulich  (1987)  restates  the  observation  that  'subjective  workload  assessments  are  sensitive  to 
manipulations  that  Influence  the  perceptuaf/centrai  processing  demands  and  relatively  insensitive  to 
manipulations  that  influence  response  execution  demands'  (p  8).  Therefore,  subjective  measures  are 
particularly  useful  in  situations  where  operators  are  system  monitors  and  the  primary  tasks  are  involved 
with  perception  and  decision  making. 

Practitioners  do  need  to  be  aware  of  possible  causes  of  dissociation  of  subjective  workload  measures 
and  performance.  Several  practical  implications  have  been  mentioned  as  well  as  suggestions  for  ways  to 
handle  them.  The  bottom  line  is  tiiat  neither  subjective  nor  performance  measures  should  be  used  as  the 
sole  basis  for  assessment  of  OWL. 
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Delay  of  Ratings 


The  effect  of  delay  in  operators  giving  ratings  has  been  briefly  touched  upon  in  previous  discussions. 
The  concern  is  that  if  operators  are  unable  to  give  ratings  In  real-time,  the  passage  of  time  and  behavior 
that  follows  may  affect  subjective  workload  assessments  which  are  made.  Short  delays  of  up  to  15 
minutes  have  not  been  found  to  have  a  significant  effect  on  subjective  ratings  (Eggemeier  et  a!..  1983; 
Hart  et  al.,  1984),  although  some  differences  were  exhbited  when  a  difficult  intervening  variable  was 
presented  (Eggemeier  et  ai.,  1904). 

Video  recording  the  operators'  activities  can  serve  as  an  aid  for  collecting  ratings  in  a  post-test  session. 
This  method  was  used  and  determined  to  be  a  viable  alternative  to  real-time  ratings  by  Gidcumb  (1985) 
when  using  SWAT.  Although  this  method  of  video  taping  activities  for  post-test  visual  recreation  has  not 
yet  been  reported  extensively  in  the  literature,  it  appears  to  bo  a  viable  alternative  when  real-time  ratings 
are  not  available  due  to  safety  or  other  practical  conrtmints.  Another  alternative  would  be  the  use  of  the 
AHP  technique  to  obtain  relative  comparisons  of  OWL  during  post-test  sessions  (Lidderdale,  1987; 
Vidulich  &  Tsang,  1987). 

Relative  vs.  Absolute  Maaaunmanta 

There  are  two  ways  in  which  ratings  can  be  used.  First,  several  OWL  ratings  can  be  used  in  a  relative 
sense  to  compare  whether  one  task  or  activity  has  been  perceived  to  have  a  higher  workload  than  another 
task  or  activity.  Second,  absolute  subjective  OWL  ratings  are  Intended  to  indicate  the  level  of  workload 
without  reference  to  any  other  task  or  activity.  However,  the  question  remains  whether  any  subjective 
workload  rating  scale  can  be  used  to  make  absolute  judgments.  Subjective  opinion  is  largely  based  on 
experience.  As  Lidderdale  (1987)  observes,  "It  Is  possible  that  ;rU  assessments  of  workload  are  made  from 
a  baseline  of  comparisons  with  other  elements  In  the  flight  and,  !f  this  is  the  case,  all  rating  methods  may 
be  relative"  (p.  73)  The  absolute  judgment  ot  workload  may  be  based  on  what  has  been  experienced 
previously;  the  highest  workload  experienced  may  be  the  touchstone  of  what  is  considered  high 
workload.  If  more  difficult  or  Intense  tasks  are  performed,  the  touchstone  for  high  workload  may  change  It 
is  uncertain  if  individuals  can  possess  an  absolute  scale  for  OWL  that  will  remain  stable  over  time. 

Some  OWL  techniques  are  explicitly  relative,  such  as  magnitude  estimation  or  the  AHP.  Other  scales 
address  the  issue  in  a  different  way.  The  NASA  scales,  for  example,  ask  operators  to  judge  the  relative 
importance  of  scale  dimensions  with  respect  to  each  individual  task,  thereby  producing  weightings  for 
individuals  by  task.  Individuals'  card  sorts  of  SWAT  have  been  found  to  be  relatively  stable  over  time  and 
the  operational  definitions  of  the  levels  of  each  dimension  remain  constant.  It  could  be  inferred  then,  that 
each  individual  would  have  an  absolute  workload  scale.  There  is  an  anecdotal  impression  based  upon 
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such  an  Inference  that  a  SWAT  ratine  of  60  or  greater  Indicates  a  high  workload  condition.  Certainly,  a 
high  rating  on  any  scale  should  be  pursuod  as  a  potential  indication  of  OWL  problems,  although  the 
relative  nature  of  subjective  ratings  may  lead  to  Inappropriate  interpretations  and  conclusions.  The  relative 
nature  of  subjective  ratings  also  cautions  against  comparing  systems  evaluated  in  different  studies  (e.g., 
subsequent  models  of  the  same  combat  system). 

todlvkkml  Dtthnnata 

The  issue  of  differences  between  Individuals  in  the  perception  of  workload  is  a  continuing  question  of 
interest  to  researchers  and  practitioners.  In  the  context  of  subjective  measures,  the  issue  Is  one  of 
individual  definitions  of  workload  and  what  aspects  of  a  particular  task  or  activity  are  considered  relevant  to 
the  assessment  of  workload.  The  NASA-TLX  was  designed  to  specifically  address  this  issue  by  using 
individual  weightings  of  the  importance  of  each  scale  dimension  to  obtain  a  workload  rating.  This  has 
been  shown  to  reduce  the  between-subjects  variation  (Ha.t  &  Staveland,  1987,  Vfduiteh  &  Tsang,  1986). 
The  conjoint  analysis  used  in  SWAT  seeks  to  account  for  individual  differences  through  the  scale 
development  (i.e.,  card  sort)  procedure  as  well  as  the  development  of  prototype  scales  (Reid,  Eggemeier, 
&  Nygren,  1982).  The  use  of  z-scores  provides  comparability  between  the  widely  varying  scores 
produced  via  magnitude  estimation.  However,  because  of  the  intersubject  variability,  OWL  evaluation 
must  always  use  a  sufficient  sample  of  subjects,  otherwise  mean  scores  obtained  in  tests  may  not  provide 
sensitivity. 

OWL  intersubject  variability  is  also  of  concern  because  of  its  potential  implication  for  selection  of 
personnel  to  operate  systems.  Do  differences  between  individual  ratings  reflect  orderings  of  capabilities 
for  handling  systems?  Unfortunately,  there  is  a  dirth  of  information  concerning  the  interrelationships 
between  individual  differences  in  ratings  of  workload  and  information  processing-related  variables  used  by 
the  Army  such  as  the  ASVAB.  OWL  individual  differences  are  therefore  an  area  for  investigation  because 
of  the  implications  for  the  Army. 


Questionnaires  and  Interviews 

The  second  broad  area  of  subjective  methods  are  those  that  use  questionnaires,  interviews  and  other 
techniques  to  obtain  estimates,  judgments,  evaluations,  comparisons,  attitudes,  beliefs  or  opinions  of 
people  (Dyer,  Matthews,  Wright,  &  Yudowitch,  1976).  Such  methods  are  frequently  used  and  are  seen  as 
useful  (Meister,  1986).  The  major  reason  for  the  widespread  use  of  such  methods  is  that  they  are 


113 


perceived  as  easy  and  quick  to  administer,  particularly  In  field  test  environments,  and  inexpensive  to 
develop  and  produce. 

Qusitikifumirmt 

Questionnaires  are  forms  In  which  written  questions  are  asked  in  a  fixed  order  and  format  and  to  which 
respondents  write  their  answers.  The  questions  may  be  open-ended,  allowing  respondents  to  write  In 
their  own  words  and  make  any  answer,  or  dose-ended,  where  the  choice  of  answers  ties  been  previously 
established,  such  as  multiple  choice  or  true  and  false.  Meister  (1985)  states  that  the  results  of  studies 
(Ellenbogen  &  Daniey,  1962;  England,  1948;  Kohen,  de  Milter,  &  Myers,  1972;  Prien,  Otis,  Campbell,  & 
Saleh,  1964;  Scates  &  Yoeman,  1950)  suggest  that  open-ended  questions  may  provide  unique 
information,  but  close-ended  questions  are  more  reliable.  A  number  of  sources  are  available  for  guidance 
in  tho  development  of  questionnaires,  including  the  advantages  and  disadvantages  of  various  types  of 
questions,  sequencing  and  wording  of  questions,  etc.  (Dyer  et  si.,  1976;  Meister,  1985;  J.S.  Army  Tost 
and  Evaluation  Command,  1975). 

The  development  of  useful  questionnaires  requires  not  only  the  choice  of  question  types  and  proper 
wording,  but  also  the  content  of  the  questions  --  What  do  they  ask?  They  need  to  be  designed  to  obtain 
the  desired  information.  Pretesting  of  questions  to  ensure  their  appropriateness  to  the  desired  end  as 
wed  as  planning  of  the  data  analysis  are  important  to  a  questionnaire's  value. 

The  advantages  of  questionnaires  are  that  they  are  less  expensive  and  can  be  completed  faster  than 
interviews  or  ratings.  Questionnaires  often  can  be  handed  out  and  collected  without  attaching  names  to 
the  answers;  hence,  they  can  be  more  anonymous  than  interviews.  As  a  result  of  their  anonymity, 
questionnaires  may  gamer  more  self-revealing  and  unfavorable  reports  than  Interviews  which  roly  on  one- 
on-one  communication. 

There  are  problems  associated  with  the  use  of  questionnaires  In  test  and  evaluation  environments. 
System  experts  may  devise  the  questions  and  not  have  expertise  in  question  development.  Yet,  as  a 
result  of  the  ease  of  putting  together  a  question,  there  is  a  tendency  for  questionnaire  use  to  proliferate. 
An  example  given  by  Taylor  and  Chartton  (1986)  describes  widespread  respondent  burnout  from  having 
answered  too  many  questionnaires  that  were  too  iong  and  not  focused  on  the  operator's  activity.  The  end 
result  was  a  vast  collection  of  meaningless  data.  The  frequent  use  of  not  well-thought-out  questionnaires 
can  result  in  data  that  are  large  in  quantity  but  limited  in  usefulness. 

A  recent  move  has  been  in  the  direction  of  creating  computerized  systems  to  create  well-constructed, 
focused  questionnaires  for  specific  purposes.  Enderwick  (1987)  describes  a  system  where  a  catalog  of 
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well-crafted  questions  on  human  factors  test  and  evaluation  topics  is  available  to  operational  test  directors 
(OTDs)  who  may  choose  any  number  of  applicable  questions.  The  questions  are  printed  out  (with  the 
name  of  the  equipment  substituted  for  the  word  "equipment"  so  that  the  questions  appear  designed  tor 
that  test)  and  can  be  re-ordered  by  OTOs.  The  final  package  is  printed  out  with  a  cover  page,  an 
Instruction  page  and  the  questionnaire.  The  compulerized  questionnaire  system  is  designed  tor  use  by 
people  who  did  not  necessarily  have  any  human  factors  training.  Test  directors  could  design 
questionnaires  to  meet  their  needs  and  new  questions  could  be  added. 

Taylor  and  Chartton  (1986)  have  developed  an  automated  adaptive  questionnaire  which  used  a 
branching  concept  to  determine  what  questions  will  be  asked  contingent  on  the  answers  to  previous 
questions.  The  respondent  answers  general  questions  on  a  seven-point  scale  and  it  the  answer  meets 
some  predetermined  criterion  (e.g.,  -2,  with  -3  being  the  most  negative  score),  more  detailed  questions 
will  be  asked.  The  contingency  branching  method  is  most  suitable  for  computer  implementation. 
Computer  implementation  also  allows  on-site  data  analysis  of  answers.  Note  that  questionnaire 
procedures  may  incorporate  a  scaling  method  thus  blurring  the  distinction  between  rating  scale  and 
questionnaire. 

Questionnaires  are  commonly  used  in  test  and  evaluation  environments  (Enderwick,  1987;  Meister, 
1986).  Anecdotal  evidence  indicates  they  are  commonly  used  for  workload  assessment  although  the 
specific  questionnaires  are  rarely  found  In  the  research  literature.  It  certainly  appears  that  the 
development  of  wort-  load  questionnaires  aided  b/  computers  could  be  helpful  to  Army  analysts.  It  seems 
that  sufficient  information  is  currently  available  to  create  a  universal  set  of  general  workload  assessment 
questions  that  can  be  tailored  for  specific  application.  However,  further  development  of  this  concept  is 
needed  before  such  a  tool  is  available  for  use. 

Interviews 

The  Interview  is  an  interpersonal  Interaction  In  which  the  interviewer  soaks  Information  or  opinions  from 
the  respondent.  It  permits  more  flexbiiity  than  the  traditional  questionnaire.  I!  allows  the  interviewer  to 
follow-up  on  tho  answers  given  and  thereby  gain  insight  into  areas  that  may  not  have  been  addressed  in  a 
written  questionnaire.  Disadvantages  of  the  interview  method  are  that  It  is  very  costly  in  time  and,  because 
of  the  personal  communication,  respondents  may  be  less  iikeiy  to  report  anything  negative  and  may  also 
be  influenced  (consciously  or  subconsciously)  by  the  interviewer.  Interviews  can  iak^  place  one-on-one 
or  with  groups  as  in  the  case  of  crew  operations.  Key  questions  can  be  determined  ahead  of  time  and 
pretested  for  understanding,  likely  responses  and  the  information  they  contain  (Meister,  1985).  Both 
Dyer  et  al.  (1976)  and  the  U  S.  Army  Test  and  Evaluation  Command  (1975)  provide  information  about 
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interview  considerations,  procedures  and  analysis,  interviews  are  useful  to  obtain  unique  information  and 
opinions  about  workload.  Meister  (1S85)  suggests  that  test  participants  should  always  be  interviewed  to 
learn  how  tho  participant  viewed  the  test  sttuatlon.  If  the  view  was  different  from  that  intended  by  the  test 
director,  then  the  data  may  have  been  affected. 

Protocol  Analyato 

Protocol  analysis  requires  operators  to  verbalize  their  thought  processes  or  performance.  This  method 
has  been  used  extensively  In  computer  interface  research  as  a  way  to  find  out  how  an  operator  solves 
problems  or  discovers  the  appropriate  commands  to  use.  Protocol  analysis  relies  on  the  ability  of  the 
subject  to  determine  irrtrospectively  thought  processes  and  then  verbalize  them,  either  during  task 
performance  or  afterwards.  Verbal  protocol  is  listed  as  an  available  subjective  workload  method  (Hart, 
1986a).  8rown  (1982)  writes  that  such  verbal  reports  can  be  very  Informative,  but  during  high  workload, 
operators  may  not  have  the  time  to  provide  complete  information.  (In  a  sense,  the  verbal  report  is  a 
secondary  task.)  Verbal  protocols  and  analysis  may  provide  useful  information,  particularly  in  computer 
interfaces  where  such  techniques  have  previously  been  used. 

Summary 

Questionnaires  and  interviews  provide  an  important  adjunct  to  workload  estimation.  Proper 
questioning  can  provide  insight  into  the  causes  of  problems  associated  with  workload.  Furthermore, 
questionnaires  and  interviews  provide  an  opportunity  for  subjects  to  give  their  detailed  impressions  of 
system  operation  and  how  it  might  be  improved.  Rat’ng  scales  are  usually  too  highly  structured  to  provide 
detailed,  subtle  impressions.  Questionnaires  and  interviews  require  careful  construction  and  should  be 
used  tc  obtain  more  detailed  i.iformation  in  all  workload  assessments.  Possible  enhancements  to  these 
subjective  measures  are  an  automated  questionnaire  design  tool  and  the  use  ot  protocol  analysis. 

Summary  and  Conclusions 

1  he  need  for  subjective  techniques  for  workload  assessment  in  applied  settings  has  been  identified 
and  substantia!  efforts  have  been  directed  toward  obtaining  a  solution  as  evidenced  by  the  amount  of 
research  performed  and  reported.  Several  recommendations  can  be  made  based  on  the  review  and 
analysis  of  the  subjective  techniques  and  the  issues  involved  in  their  use: 

•  Subjective  measures  can  provide  valuable  information  concerning  the  operators' 
perception  of  their  workload  experience  in  specific  tasks  or  activities.  Subjective 
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techniques  have  been  demonstrated  to  be  sensitive  and  should  always  be  included 
in  an  evaluation  wherever  possible. 

•  The  questions  of  interest  to  the  system  designer  or  evaluator  should  be  defined 
before  choosing  a  technique  by  which  to  obtain  answers.  Overall  woikload  ratings, 
such  as  the  Modified  CH,  will  provide  a  global  assessment  and  can  identify  potential 
problems  or  workload  chokepoints.  More  specific  information,  like  that  available 
through  multidimensional  scales  or  questionnaires  and  interviews,  will  bo  necessary 
to  potentially  diagnose  specific  sources  of  workload  and  identify  solutions. 

•  The  value  of  qualitative  information,  like  that  obtained  from  questionnaires  or 
interviews,  should  not  be  underestimated. 

•  All  subjective  measures,  including  questionnaires  and  interviews,  must  be  carefully 
planned  and  implemented  to  obtain  valid  and  useful  data. 

•  The  OWL  evaluator  should  be  aware  of  the  measurement  scale  characteristics  (ordinal 
vs.  interval;  uni-  vs,  multi-  dimensional)  and  to  what  extent  these  characteristics  will 
influence  the  interpretation  of  results  and  conclusions  that  can  be  appropriately 
drawn. 

•  Multidimensional  scales,  like  NASA-TLX  and  SWAT,  offer  the  opportunity  for  using 
tho  subscale  ratings  in  diagnosing  OWL  with  respect  to  specific  system  design 
characteristics. 

•  Available  evidence  indicates  that  Modified  Cooper-Harper,  NASA-TLX  and  its 
predecessor,  and  SWAT  are  sensitive  to  differences  in  workload.  Substantial 
research  supports  their  use  in  OWL  assessments  l  sss  Merman  i«=  «.rrently 
available  on  the  Bedford  scale  and  AHP.  The  original  Cooper-Hatper  scale  has  been 
found  to  be  particularly  sensitive  to  psychomotor  tasks  in  aircraft  environments,  ft  is 
not  known  if  it  is  equally  sensitive  to  psychomotor  tasks  in  other  system  control  activity 
(e.g.,  tank  operation). 

•  Psychometric  scaling  techniques  have  been  shown  to  be  sensitive  to  differences  in 
task  manipulations.  These  are  viable  alternatives  although  a  certain  degree  of 
knowledge  concerning  these  techniques  is  required  in  order  to  meet  necessary 
design,  implementation,  and  mathematical  requirements. 

•  The  use  of  observers  as  well  as  operators  to  make  OWL  ratings  is  an  alternative  in 
workload  assessment,  although  trade-offs  in  information  quality  exist. 

Subjective  OWL  assessments  can  provide  useful  and  valid  information  for  the  Army  if  there  is:  (a) 
careful  definition  of  questions  to  be  answered;  (b)  careful  solection  of  technique;  and  (c)  careful, 
consistent  implementation  of  technique  in  a  laboratory,  simulator,  or  field  environment. 
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CHAPTER  6.  SECONDARY  TASK  TECHNIQUES 


An  imooriant  reason  for  measuring  operator  workload  derives  from  the  objective  of  designing  human- 
machine  Interfaces  that  will  optimize  system  performance.  To  do  so  requires  knowledge  of  the  work 
capacities  and  imitations  of  the  human  operator. 

Secondary  task  techniques  have  been  employed  as  a  tool  to  assess  the  work  capacities  and  limitations 
of  the  human  operator  with  respect  to  primary  task  performance.  Typically,  the  secondary  task  paradigm  is 
used  in  applied  settings  to  assess  the  workload  associated  with  8  primary  task  such  ss  piloting  an  aircraft. 
To  derive  the  workload  associated  with  the  primary  task,  the  operator  is  required  to  perform  an  additional  or 
secondary  task  simultaneously  with  the  primary  task.  The  relative  workload  associated  with  *he  primary 
task  is  reflected  in  tha  levels  of  performance  on  the  secondary  task.  That  is,  because  primary  task 
performance  requires  the  utilization  of  the  resources  and  capabilities  of  an  operator,  secondary  task 
performance  will  reflect  the  remaining  resources  and  capabilities  or  relative  spare  capacity  o*  an  operator. 
For  example,  if  the  operator  Is  fully  loaded  by  the  primary  task,  performance  on  the  secondary  task  may  be 
unacceptable.  By  contrast,  If  the  operator  is  only  partially  load  ad  by  the  primary  task,  performance  on  the 
secondary  task  si  .ould  be  acceptable.  (See  Chapter  2  for  a  description  otf  the  relation  between  human 
performance  and  operator  workload.) 

A  critical  aspect  of  secondary  task  paradigm  is  the  deteimination  of  acceptable  and  unacceptable 
performance  on  the  secondary  task.  This  determination  rrtsy  be  accomplished  by  establishing  the 
performance  level  on  the  secondary  task  without  the  primary  task,  and  th^n  cornp^ng  this  baseline 
performance  level  to  secondary  task  performance  with  the  primary  tasK  The  determination  may  also  be 
accomplished  by  varying  the  difficulty  of  the  secondary  task  while  th<<  operator  maintains  the  primary  task 
performance.  Then  the  comparison  is  on  secondary  task  p^ormarnce  across  the  levels  of  difficulty. 
Through  these  various  manipulations,  the  secondary  task  paradigm  offers  the  practitioner  a  means  to 
assess  the  relative  workload  associated  with  a  primary  task  which  may  not  be  apparent  from  primary  task 
measures  alone. 


Secondary  Task  Paradigm:  A  Solution  or  a  Problem? 

The  secondary  task  paradigm  encompasses  several  iechrsiques  that  have  been  employed  to  assess 
the  spare  capacity  and  resources  available  for  additional  work  when  performing  a  primary  task.  There  have 
been  many  reviews  on  this  topic  over  the  past  25  years.  For  example,  Knowles  (1963),  O’Donnell  and 
Eggemeier  (1986),  Ogden,  Levine,  and  Eisner  (1979).  Rotfe  (1971),  and  Williges  and  Wierw.lle  11979) 
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have  all  provided  reviews  of  the  techniques  es  well  as  the  methodological  issues  associated  with  their 
usage.  Because  secondary  task  techniques  have  received  i  considerable  amount  of  attention,  it  would 
seem  appropriate  to  assume  that  guidance  in  the  use  of  such  techniques  would  be  straightforward  and 
readily  available.  This  is  not  the  case. 

To  illustrate,  Gopher  and  Donchin  (1986)  and  O’Donnell  and  Eggemeier  (1986)  disagree  witu  each 
other  in  the  same  volume  of  the  Handbook  of  Perception  and  Human  Performance  concerning  the 
methodological  Issues  that  should  be  addressed  In  implementing  a  secondary  task  technique. 
Specifically,  O’Donnell  and  Eggemeier  support  clair.  •*  that  the  secondary  task  must  not  interfere  with  the 
performance  of  the  primary  task.  By  contrast,  Gopher  and  Donchin  take  exception  to  such  a  position  and 
support  the  position  that  it  is  legitimate  for  secondary  tasks  to  interfere  with  performance  of  the  primary 
task. 

For  the  practitioner  concerned  with  operator  workload,  such  mixed  messages  regarding  the 
implementation  of  secondary  task  techniques  are  troublesome.  In  fairness  to  the  authors  just  cited,  their 
apparent  disagreement  illustrates  the  differences  in  opinion  found  in  the  literature  in  how  best  to  select, 
implement,  and  interpret  secondary  task  measures.  The  reasons  behind  such  disagreements  are  based 
on:  (a)  theoretical  grounds,  (b)  the  findings  from  the  plethora  of  secondary  task  studies,  and  (c)  practical 
considerations.  These  are  briefly  dscussed  below  as  background  for  our  approach  for  use  of  secondary 
tasks. 

Theoretical 

Meaaure  Span  Capacity.  One  theoretical  position  has  espoused  the  secondary  task  paradigm  as 
a  too!  to  provide  an  uncontaminated  measure  of  the  spare  capacity  or  resources  not  oxpended  with  a 
primary  task  (Kahneman.  1977).  This  is  the  view  of  O'Donnell  and  Eggemeier  (1986).  Such  a  theoretical 
position  requires  the  primary  task  performance  to  be  stable  when  the  secondary  task  is  concurrently 
performed  with  the  primary  task.  Then  and  only  then  can  changes  in  secondary  task  performance  bo 
interpreted  as  a  reflection  of  spare  capacity  leftover  from  primary  task  demands.  (See  Kantowitz  [1985]  for 
a  critique  of  the  spare  capacity  concept  and  the  problems  associated  with  such  a  concept  as  it  relates  to 
measures  of  performance.) 

Load  the  Operator  A  different  theoretical  perspective  Is  to  view  the  objective  of  the  dual  task 
paradigm  to  be  the  measurement  of  the  operator's  ability  to  |>erform  adequately  two  tasks  concurrently 
(Schneider  &  Fisk,  1962).  This  is  the  view  of  Gopher  and  Donchin  (1986).  This  theoretical  position 
maintains/argues  that  changes  in  pnmary  task  performance  when  a  secondary  task  is  performed 
concurrently  reflects  an  inefficiency  in  human  performance  in  the  dual-task  situation  as  opposed  to  a 
methodological  flaw. 


Wickens'  Resource  Model.  Another  important  theoretical  tormulation  is  Wickers'  Resource  Model 
(Wickens,  1980)  The  resource  model  has  been  offered  as  a  guide  for  secondary  task  selection  with 
respect  to  the  nature  of  such  tasks  (O'Donnell  &  Eggemeler,  1986).  The  model  depicts  the  overall  tiuman 
irrformrnion  processing  system  as  composed  of  multiple  but  separate  processing  structures/resources, 
each  of  which  can  have  capacity  Imitations  and  be  a  potential  bottleneck  In  the  human  processing  system. 
These  separate  processing  structures  are  defined  along  the  following  three  dichotomous  dimensions: 

•  stages  of  information-processing  (perceptual/central-processing  operation  vs. 
response  selection  and  execution), 

•  modalities  of  perception  (auditory  vs.  visual),  and 

•  codes  of  information  processing  and  response  (spatial-manual  vs.  verbal-vocal). 

Each  processing  structure  has  its  own  limited  supply  of  resources  which  are  not  interchangeable  with 
other  processing  structures.  It  is  suggested  that  the  secondary  task  be  selected  so  that  it  has  the  same 
processing  structures  utilized  by  the  primary  task.  In  this  manner,  the  secondary  task  is  more  sensitive  in 
identifying  the  level  or  amount  of  spare  capacity  (O'Donnell  &  Eggemeler,  1986).  In  support  of  this 
position,  Shingledecker,  Acton  and  Crabtree  (1983)  conducted  a  study  that  demonstrated  that  the 
sensitivity  of  the  secondary  task  performance  varied  as  a  function  of  the  primary  task  resource  demands 
according  to  Wickens'  model.  Such  results  are  promising,  but  it  may  not  be  readily  apparent  which 
processing  structures  are  dominant  with  performance  on  a  complex  system  such  as  a  helicopter. 

Summary.  Therefore,  depending  upon  one's  theoretical  position  and  primary  interest  in  using  the 
secondary  task  paradigm,  the  selection  of  a  particular  secondary  task  technique  will  vary.  As  a  result, 
secondary  task  selection  in  applied  settings  may  still  be  difficult. 

Results  of  Studies:  What  to  Believe? 

Another  contributing  factor  to  the  apparent  confusion  concerning  the  appropriateness  of  various 
secondary  task  techniques  stems  from  the  difficulty  of  simply  interpreting  reported  findings.  For  example, 
Ogden  et  al.  (1970)  provided  a  table  containing  144  secondary  task  studies  and  listed  the  major  findings 
from  these  studies.  Perusal  of  the  table  reveals  that  for  most  secondary  tasks  one  study  can  be  cited  to 
chow  improvement,  another  degradation,  and  a  third  no  change  in  secondary  task  performance.  It  is 
consequently  not  readily  apparent  which  secondary  tasks  are  most  appropriate  to  assess  OWL.  Appendix 
A  contains  a  detailed  review  of  the  secondary  task  iterature  which  Illustrates  this  complexity. 
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Pmcthxl  Conekimwtonx  fMmmvir  vs.  Apptad  Cotmxt 


The  vast  majority  o i  the  work  done  with  secondary  tasks  has  been  conducted  in  controlled  laboratory 
situations.  As  a  result,  secondary  tasks  that  have  been  found  sensitive  aid  capable  of  measuring  spare 
capacity  in  laboratory  situations  may  not  be  applcabte  in  applied  settings.  For  safety  reasons,  for  example, 
flying  a  helicopter  precludes  the  use  of  any  secondary  tasks  that  would  possibly  interfere  with  the  pilot's 
ability  to  maintain  control  over  the  aircraft.  Another  practical  consideration  is  that  the  elaborate 
experimental  procedures  usually  required  to  implement  a  secondary  task  paradigm  may  be  excessive  tor 
many  system  development  efforts  In  terms  of  manpc  »r  and  time  constraints. 

The  remaining  sections  of  this  chapter  consider  secondary  task  techniques  as  used  in  applied 
workload  assessment  settings.  It  dHfers  from  other  chapters  in  that  we  refrain  from  reviewing  all  the 
literature  at  this  point  for  two  reasons.  First,  there  is  a  tremendous  volume  of  Rterature.  Second,  most  of 
the  literature  is  theoretical  and  academic  In  nature.  Although  much  of  the  discussion  in  the  iterature  is  of 
considerable  importance  in  understanding  how  cognitive  components  of  workload  impact  human 
performance  per  se,  It  may  be  of  secondary  importance  to  the  individual  evaluating  a  system.  As  an 
alternative,  we  have  opted  to  put  the  more  general  review  of  the  iterature  in  an  appendix  (Appendix  A)  so 
that  it  is  available  for  the  interested  reader.  We  will  now  discuss  our  approach  and  then  we  will  presort 
examples  concerning  design  Issues  and  suggest  applications  of  secondary  tasks. 

Our  Approach 


The  problems  facing  the  practitioner  interested  in  workload  assessment  are:  (a)  knowing  tho 
circumstances  in  which  secondary  task  techniques  are  appropriate,  and  (b)  which  ones  to  use.  Our 
approach  is  a  systematic  attempt  to  provide  such  answers  In  identifying  appropriate  secondary  task 
techniques  within  the  context  of  a  system  development  effort.  The  approach  is  directed  from  a  very 
pragmatic  philosophy.  That  is,  secondary  tasks  offer  utility  for  a  system  development  effort  when  such 
tasks  are  used  to  load  the  operator  and  drive  him  to  the  performance  envelope  boundary.  The  purpose  is 
to  determine  how  much  more  can  the  operator  do.  This  chapter  describes  the  most  practical  secondary 
task  techniques  that  aiiow  such  measurements. 

To  evaluate  the  appropriateness  and  utility  of  secondary  tasks  in  applied  settings,  It  was  deemed 
necessary  to  examine  the  specific  design  issues  or  concerns  that  would  call  for  using  secondary  task 
techniques,  it  is  important  to  recognize  that  the  basic  secondary  *ask  paradigm  encompasses  several 
different  techniques  which  manipulate  or  vary  the  parameters  of  secondary  tasks  in  order  to  identify 
potential  OWL  problems  with  aprtmary  task.  We  judge  these  various  techniques  for  their  appropriateness 
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ta  answering  specific  design  questions  by  reviewing  the  Hteraiure  In  support  of  such  techniques.  The 
utility  of  secondary  task  techniques  way  also  examined  with  respect  to  meeting  th  *  Army  need  for 
workload  techniques  that  are  relatively  easy  to  implement,  can  be  usee  to  Identity  OWL  problems  within 
complex  systems,  and  provide  relatively  wraigfitforwartl  application  for  data  collection  and  analysis. 

Secondary  Task  Techniques  In  Applied  Settings 

Previous  reviews  noted  that  secondary  task  techniques  are  most  applicable  to  earty  design  stages  of 
systems  in  controlled  laboratory  settings  (».g,  Schiflett,  1976;  Wlerwllle  &  WUliges,  1979).  Several  factors 
have  been  suggested  for  their  lack  of  use  or  applicability  during  the  later  phases  of  the  system 
development  process  (Ogden  et  al.,  1979;  Shingledecker  et  at.,  1980).  For  example.  Ogden  et  al.  (1979) 
noted  that  secondary  task  techniques  may  not  receive  uniform  operator  acceptance.  As  a  result, 
operators  possibly  will  mn  the  gamut  from  neglecting  the  secondary  task  altogether  to  assigning  it  such  a 
high  priority  that  it  artificially  contaminates  and  changes  the  test  situation.  In  either  case,  the  results  from 
such  test  situations  will  not  accurately  assess  the  amount  of  resources  committed  to  the  primary  task  or  the 
amount  of  spare  capacity  remaining. 

More  recently,  researchers  have  suggested  the  applicability  of  some  secondary  task  techniques  for 
use  in  simulations  as  well  as  the  later  phases  of  system  development  (e.g..  Bortolussi,  Kantowttz  &  Hart, 
1986;  Shingledecker,  1987).  These  techniques  are  designed  to  alleviate  problems  such  as: 

•  instrumentation  limitations  which  preclude  the  use  of  secondary  tasks  into  system 
prototypes  or  high  fidelity  simulators, 

-  potential  task  intrusion  caused  by  the  use  of  secondary  tasks,  and 

•  pcor  operator  acceptance  of  secondary  tasks  (Shingledecker,  1987). 

Such  techniques  offer  great  promise  for  the  Army  since  they  seem  to  overcome  the  potential  objections 
concerning  operator  acceptance  and  artificial  intrusiveness  on  primary  task  or  system  performance.  In 
addition,  these  techniques  are  relatively  easy  tc  implement.  Four  specific  design  and  development 
examples  in  which  these  secondary  task  techniques  offer  the  greatest  utility  are  described  in  subsequent 
sub-sections.  For  each  of  the  examples,  a  brief  description  will  be  given  and  subsequently  discussed  with 
regard  to  appropriate  secondary  task  techniques.  These  discussions  are  intended  only  to  provide 
sufficient  detail  for  understanding  secondary  task  techniques.  Following  discussion  of  the  examples, 
consideration  of  other  potential  secondary  tasks  wQ!  be  given.  Our  integrated  approach  to  a  workload 
assessment  battery  containing  several  different  types  of  techniques  is  discussed  in  Chapter  8. 
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System  Design  and  Devaioptnmi  Example  1 


Description.  Successful  operation  of  &  system  requires  that  the  operator  routinely  perform  several 
tasks  In  order  to  carry  out  a  mission  (e.g.,  tracking  targets,  radio  communications,  weapon  delivery,  etc.). 
You  are  interested  in  knowing  whether  an  operator  uen  adequately  perform  these  tasks.  Specifically,  are 
there  limits  in  the  operator's  capability  to  perform  these  tasks,  such  that  if  the  Bmits  are  exceeded  the 
operator's  performance  deteriorates?  (This  is  an  e  .ample  of  overloading  the  operator,  although 
experiments  do  rot  always  show  reduced  performance  on  the  primary  task.) 

Considerations  for  Secondary  Task  Technique t.  The  embedded  secondary  task  technique 
developed  by  Shlngiedecker  and  colleagues  (Shlngtodecker,  Crabtree,  Simons  Court  right  &  O'Donnell, 
I960;  SMngiedecker  &  Crabtree,  1982)  offers  a  means  for  such  an  assessment.  The  concept  of  the 
embedded  secondary  task  is  based  on  overcoming  the  problems  of  implementation,  Intrusiveness,  and 
operator  acceptance  mentioned  earlier.  The  embedded  secondary  task  technique  alleviates  these 
concerns  by  utilizing  an  existing  sub-iask  of  the  system,  such  as  radio  communications  as  the  secondary 
task,  that  is  fully  integrated  with  existing  system  hardware  and  software  and  with  the  operator's  conception 
of  the  mission  environment.  To  illustrate,  Shinglodecker  and  Crabtree  (1982)  reported  a  study  in  which 
they  used  the  radio  communications  task  in  an  aircraft  environment  as  the  embedded  secondary  task. 
They  scaled  the  various  task  loading  properties  of  several  radio  communication  messages.  The  task  load 
is  the  work  pilots  are  required  to  perform  in  response  to  such  ratio  messages  such  &s  request  for  radio 
frequency  change  or  request  for  traffic  information.  Based  on  their  scaling  of  radio  message  task  load, 
they  were  able  to  infer  that  increased  communication  load  produced  decrements  in  the  primary  task 
performance  of  operator  tracking  in  a  fight  simulator.  Similarly,  ratio  messages  that  were  more  demanding 
also  elicited  signs  of  overload  In  secondary  task  performance;  the  operator  took  longer  to  perform 
required  actions  in  response  to  such  messages  when  compared  to  control  conditions,  i.e.,  the  radio  task 
by  itself.  Such  findings  encourage  the  use  of  embedded  secondary  tasks  in  assessing  the  limits  of 
operators'  workload  capabilities 

Similar  findings  have  been  reported  in  several  simulation  studios  (e.g.,  Wierwiile,  Casali,  Connor,  & 
Rahimi,  1985).  The  primary  task  In  the  studies  reported  by  Wierwiile  et  a).  (1985)  required  pilots  to 
maintain  a  steady  course  under  simulated  conditions  (e.g.,  mild  random  crosswird).  Within  each  study,  a 
task  that  can  be  described  as  an  embedded  secondary  task  was  manipulated  to  increase  the  demands  on 
the  operator  by  the  tasks  For  example  in  one  study,  they  varied  the  number  of  warning  and  emergency 
lights  that  pilots  were  required  to  detect  (monitoring  task).  In  another  study,  they  varied  the  complexity  of 
wind-triangular  course  problems  (navigation  tasks)  to  be  solved  during  the  simulated  flight,  in  a  third 
study,  they  varied  the  number  of  occurrences  of  the  pilot's  call  sign  (ratio  communications  task)  to  which 
pilots  responded  VYerwiiie  et  al.  (1985)  do  not  classify  these  manipulations  of  task  parameters  as 
embedded  secondary  tasks,  although  their  use  of  such  sub-tasks  fits  the  embedded  secondary  task 
paradigm. 
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The  results  trom  the  Wlerwille  et  aJ.  studies  were  quite  revealing.  In  all  cases,  the  manipulation  of  the 
embedded  secondary  task  demands  resuited  In  reduced  pitot  performance  on  secondary  tasks,  while  the 
f;ertorrna>*ce  measures  for  the  primary  task  remained  relatively  stable.  Such  finings  are  Indicative  of 
pitots*  capacity  Jo  handle  workload  demands  (i.e.,  spare  or  reserve  capacity  as  assessed  by  embedded 
secondary  tasks).  These  results  are  further  substantiated  by  the  fact  that  more  traditional  secondary  task 
techniques,  time  estimation,  wet  a  also  used  In  the  study  and  exhibited  similar  results. 

Finally,  Chiles  and  AJktisi  (1979)  with  a  multiple-task  perfonrance  battery  (MTP6)  have  used  similar  logic 
as  employed  In  embedded  secondary  task  rtaradlgm.  In  particular,  they  assumed  that  the  monitoring  tasks 
in  their  task  battery  were  acting  as  secondary  tasks.  Based  on  this  assumption,  they  used  the  monitoring 
task  results  as  Indices  of  workload  Imposed  by  cBfferent  combinations  of  the  other,  time-shared,  active 
primary  tasks  to  develop  a  workload  metric.  Taken  together,  the  body  of  these  results  lead  to  the 
conclusion  that  the  workload  associated  with  time-shared  multitask  systems  can  be  assessed  by  use  of 
the  embedded  secondary  task  technique. 

System  Design  end  Development  Example  2 

Description.  You  have  two  alternative  designs  of  a  system  or  sub-system  which  have  been  shown 
by  previous  testing  to  be  essentially  the  same  (rio  differences)  with  respect  to  primary  task  measures.  In 
this  situatf**r you  are  faced  with  what  appears  to  be  two  comparable  designs.  Which  system  design  do 
you  choose?  (This  example  is  or»e  in  which  the  practitioner  might  8ko  to  determine  where  the  operator  is 
in  the  workload  performance  envelope.  However,  this  determination  is  not  absolutely  essential  to  answer 
the  question.) 

Conslderattom  tor  Secondary  Teak  Techniques.  Besides  the  potential  cost  factor  differences 
between  the  two  designs,  there  may  also  exist  operator  workload  differences  that  are  not  being  reflected 
by  primary  task  measures.  The  secondary  task  paradigm  can  be  used  to  determine  if  either  of  two  designs 
is  less  demanding  on  the  operator.  This  is  important  because  the  less  demanding  design  will  leave  more 
soar©  or  reserve  capacity  so  that  the  operator  can  perform  the  mission  tasks  under  more  demanding 
conditions  (e.g..  comoat)  than  those  investigated. 

Embedded  Secondary  Task.  The  embedded  secondary  task  technique  is  applicable  to  this 
design  example  if  the  two  alternative  designs  have  subtasks  that  can  be  used  as  the  secondary  task.  In 
fact,  Shingtetiecker  (1987)  describes  a  situation  similar  to  the  design  example  described  above  in  which 
the  errtoedded  secondary  task  technique  is  offered  as  the  vehicle  to  Identify  the  most  appropriate  design 
alternative. 
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H  the  embedded  secondary  task  technique  cannot  be  applied,  there  are  several  traditional  secondary 
tasks  which  may  be  useful.  These  secondary  tasks  have  been  used  to  determine  the  spare  capacity  of 
operators  when  engaged  in  complex  system  operations  such  as  flying  an  aircraft 

77m*  Estimation  Secondary  Task.  Typically  with  this  task,  operators  are  requir  ed  to  produce  time 
intervals  of  10  second  durations  without  using  counting,  tapping  or  any  sort  of  direct  timing  procedures 
(see  Hart,  1978,  for  the  merits  of  such  a  time  estimation  procedure).  The  premise  behind  this  procedure  is 
that  ttie  busier  an  operator  is,  the  less  attention  Is  available  to  judge  time  accurately  and  as  a  result  the 
subjective  impression  of  time  becomes  less  accurate  with  respect  to  objective  time.  That  is,  operators  will 
produce  longer  and  more  variable  estimates  of  10  second  time  intervals  because  they  lose  track  of  time. 
To  illustrate  the  use  of  this  technique,  Bortolussi,  Karttowiiz  and  Hart  (1986)  conducted  a  study  with  plots 
in  a  Singer-Link  GAT-1  flight  trainer.  Two  full-mission  instrument-flight-rule  scenarios  (high  and  low 
workload  scenarios)  were  utiBzed.  In  addition,  each  scenario  was  designed  to  contain  flight  segments  that 
varied  in  difficulty.  The  results  were  such  that  the  time  production  secondary  task  discriminated  between 
low  and  high  workload  scenarios  (i.e.,  longer  time  intervals  for  the  high  workload  scenarios).  Furthermore, 
it  discriminated  among  individual  (light  segments  in  the  high  workload  scenario  but  did  not  in  the  k>w 
workload  scenario.  The  variaPlity  of  time  productions  was  also  greater  for  the  high  workload  scenarios 
than  for  the  low  workload  scenario.  Similar  results  have  been  reported  by  other  researchers.  For  example 
in  a  series  of  studies,  Wierwille  et  al.  (1985)  found  the  variability  of  time  productions  (i.e.,  standard 
deviation)  discriminated  between  various  workload  conditions  that  were  manipulated  within  a  flight 
simulator.  The  workload  concfitions  Involved  task  loadings  or  workload  levels  on  either  psychomotor, 
perceptual,  mediational,  or  communication  task  components  of  the  flight  simulator.  The  merits  of  using 
the  time  estimation  production  technique  for  assessing  the  relative  workloads  of  two  comparable  design 
alternatives  are  several.  It  requires  little  instrumentation  or  training  and  can  be  included  as  a  normal  part  of 
an  operator's  duties  without  interfering  with  such  duties  (Hart,  1986). 

Choice  Reaction  Tima  Secondary  Task.  Another  secondary  task  that  is  relatively  easy  to  use  is 
choice  reaction  time.  This  technique  involves  operators  responding  to  several  visually  presented  stimuli 
(e  g.,  a  light  emitting  diode  with  arrows  pointing  in  different  directions),  with  each  stimulus  requiring  a 
different  response  such  as  different  buttons  to  press.  To  illustrate,  the  Bortolussi  et  al.  study  (1986)  cited 
above  also  included  2  and  4-choice  reaction  time  tasks  that  plots  performed  during  the  fight  scenarios. 
Mean  reaction  time  scores  for  both  2  and  4-chotae  reaction  time  tasks  discriminated  between  low  and  high 
workload  scenarios.  The  choice  reaction  time  tasks  also  discriminated  the  Afferent  workload  levels  among 
different  flight  segments.  These  results  have  been  repScated  in  another  study  by  Bortolussi  et  al.  (1987). 
The  merits  of  using  choice  reaction  time  as  a  secondary  task  les  in  its  simplicity,  ease  of  implementation, 
and  ease  of  interpretation  of  results.  Moreover,  its  sensitivity  follows  the  theoretical  basis  for  its  use  as  a 
secondary  task;  that  is,  it  reflects  central  information-processing  demands  as  weil  as  response  selection 
demands. 
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System  Design  and  Development  Exetnpi*  3 

D ascription.  You  have  a  system  that  is  under  the  Product  Improvement  Program  (PIP)  for 
enhancements  or  modifications.  You  are  interested  In  whether  the  operator  can  handle  the  new 
capdOiiit'tfS  and/or  new  functionalty  that  is  planned. 

ConskJerethne  for  Secondary  Task  Techniques.  if  the  specific  enhancement  is  definable  as  a 
new  subtask,  It  can  be  examined  easily  within  the  framework  of  the  embedded  secondary  task  technique. 
The  new  task  can  act  as  a  secondary  task.  The  demands  (e.g.,  the  timing  and  number  of  radio  messages 
received  with  a  communications  task)  associated  with  the  new  task  can  be  varied  to  examine  its  effects  on 
the  operator's  performance  with  the  existing  system.  By  employing  the  embedded  secondary  task 
technique,  it  is  possible  to  elucidate  the  conations  under  which  the  new  task  may,  in  fact,  hinder  operator 
performance.  Another  variation  of  the  embedded  secondary  task  technique  would  involve  setting  the 
new  task  aside  and  the  manipulation  of  an  existing  subtask  of  the  system  as  the  secondary  task  in  order  to 
determine  the  limits  of  operator  performance.  By  so  doing,  it  is  possible  to  estimate  the  spare  capacity  an 
operator  would  have  for  a  new  subtask. 

If  these  variations  ot  the  embedded  secondary  task  technique  cannot  be  applied,  there  are  several 
other  secondary  tasks  which  may  be  useful.  Scenarios  for  system  usage  can  be  developed  within  which 
time  intervals  can  be  identified  for  the  operator  involvement  with  the  new  task.  A  secondary  task  can  be 
substituted  for  the  proposed  task  to  examine  the  spare  capacity  that  would  be  available  to  perform  the 
new  task  within  the  context  of  the  system's  other  requirements  (tasks)  placed  on  the  operator.  Choice 
reaction  time  and  time  estimation  are  two  seoondary  tasks  that  may  be  applicable  for  these  circumstances. 
Bortolussi.  Hart,  and  Shively  (1987)  provide  evidence  for  the  use  of  secondary  tasks  in  a  synchronized 
manner  with  specific  scenario  events  in  order  to  identify  changing  workload  levels  within  the  context  of  a 
flight  simulator.  They  synchronized  the  presentation  of  a  choice  reaction  time  task  and  time  production 
interval  task  to  specific  events  during  high  and  low  workload  flight  scenarios.  By  so  doing,  they  were  able 
to  discriminate  with  both  secondary  tasks  between  high  and  low  workload  scenarios.  They  suggested  that 
these  results  could  be  further  examined  by  a  detailed  time-Sne  analysis  to  localize  the  specific  events  that 
produced  the  apparent  differences  between  flight  workload  scenarios.  This  is  similar  to  tho  proposed  use 
of  secondary  tasks  being  offered  in  this  section. 

System  Design  and  Devetopment  Example  4 

Description.  You  have  a  system  under  test  and  evaluation.  You  are  not  only  interested  in 
knowing  whether  the  system  can  be  handled  by  operators  within  the  context  of  a  mission  scenario  but 
also  where  the  potentially  high  operator  demand,  areas  lead  to  operator  workload  problems.  Clearly, 
loading  the  operator  will  show  performance  deficiencies  that  identify  the  high  workload  areas. 
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ConBktontton*  lor  Secondary  Task  Tachnlqma.  It  Is  quit®  probable  that  primary  task  measures 
will  answer  the  direct  question  concerning  the  operator's  capacities  to  handle  the  system  within  a  set  ot 
conditions  tested.  With  respect  to  Identifying  the  areas  that  are  relatively  high  in  workload  demands,  the 
embedded  secondary  task  paradigm  can  be  utilzod.  The  supposition  Is  that  the  operator  can  be  driven  to 
performance  Imlts  by  various  task  loadings  on  the  designated  seoondary  task.  By  so  doing,  breakdowns 
in  human  performance  can  be  identified  that  may  otherwise  not  be  shown  with  primary  task  measures 
under  normal  circumstances.  Additionally,  If  there  is  a  possibility  to  break  the  mission  Into  segments, 
examination  of  the  performance  within  segments  will  help  to  Identify  the  problem  areas. 

Another  possible  method  is  to  synchronize  secondary  task  presentations  to  specific  primary  task 
sequences  that  may  bo  suspected  of  high  workload  but  may  not  be  reflected  by  primary  task  measures. 
You  are,  in  essence,  attempting  to  Identify  momentary  high  workload  areas  that  may  under  stressful 
circumstances  contribute  to  poor  operator  performance.  The  Bortolussi  et  al.  (1987)  article  described 
above  is  an  example  of  using  secondary  tasks  in  this  manner.  Based  on  this  study,  choice  reaction  time 
tasks  and  time  Interval  production  tasks  may  be  appropriate  for  such  a  type  of  operator  workload 
assessment. 

Other  Secondary  Tasks 

With  respect  tc  secondary  task  techniques  not  specifically  described  in  this  chapter,  there  are  several 
that  have  been  shown  to  be  sensitive  to  operator  workload  levels.  For  example,  the  Michon  Interval 
Production  Task  (IPT)  requires  the  subject  to  generate  a  series  of  regular  time  intervals  by  executing  a 
motor  response  such  as  a  finger  tap  every  two  seconds  (Michon,  1964).  The  IPT  has  been  shown  by 
Shingledecker  et  al.  (1983)  to  be  sensitive  to  psychomotor  task  loadings  for  primary  tasks  but  not  other 
types  of  task  loadings  such  as  memory  and  perceptual  sustained  attention.  Accordingly,  the  Michon 
paradigm  seems  appropriate  for  assessing  psychomotor  workload.  What  limits  its  applicability  is  the  fact 
that  the  operator  must  perform  the  IPT  with  one  hand  devoted  continuously  to  the  task  and  as  a  result  may 
limit  its  use  with  complex  systems  that  require  operators  tc  have  free  use  of  both  hands. 

The  Sternberg  Memory  Task  (Sternberg,  1966)  has  also  been  shown  to  be  sensitive  to  operator 
workload  levels  (e.g.,  Spicuzza,  Pincus,  &  O’Donnell,  1974).  The  task  Involves  memorizing  a  set  of  Items, 
usually  digits.  Later  in  testing,  a  single  probe  digit  is  presented.  The  operator's  task  is  to  indicate  whether 
the  probe  was  in  the  memorized  (positive)  set.  Both  response  time  and  accuracy  are  measured.  Memory 
load  may  be  varied  by  including  different  numbers  of  Hems  in  the  memory  set.  Research  shows  that 
reaction  time  to  the  probe  increases  (nearly  with  the  number  of  Items  in  the  memorized  set  In  this  way,  a 
slope  can  be  determined  which  reflects  tho  rate  of  memory  search  and  the  degree  of  cognitive  loading. 
Wickens,  Hyman,  Dellinger,  Taylor,  and  Meador  (1986)  reviewed  seven  studies  that  employed  the 
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Sternberg  task  in  flight  simulators  or  aircraft  environments.  The  power  of  the  Sternberg  task  lies  in  its 
potential  diagnostic  value  In  distinguishing  between  cognitive  processing  task  loading  and  response  task 
loading  tor  a  primary  task.  This  reouires  analysis  of  the  Sternberg  task  data  tor  changes  in  the  slope  and 
intercept  of  such  data  in  order  to  infer  task  loadings  on  either  cognitive  processing  or  response  selection 
as  a  result  of  primary  task  demands.  Based  on  their  review,  Wlckens  et  al.  (1986)  have  questioned  the 
utility  of  such  an  analysis  in  a  typical  operator  workload  assessment  situation  since  the  studies  reviewed 
reported  a  high  degree  of  instability  in  the  slope  and  Intercept  data.  As  a  result,  Wickens  et  al.  (1986)  have 
recommended  the  use  of  one  level  of  the  Sternberg  task  as  a  general  memory  secondary  task  to  infer 
operator  workload  levels.  Wickens  et  al.  (1986)  have  also  noted  that  the  Sternberg  task  may  be 
insensitive  to  high  workload  levels  because  pilots  may  shun  the  task  undar  high  workload. 

Conduaions 


Of  all  secondary  task  techniques,  the  embedded  task  offers  the  most  practical  utility  for  the  Army.  By 
utilizing  this  technique,  one  may  overcome  many  of  the  problems  identified  in  using  these  largely 
laboratory  oriented  secondary  tasks  in  appfied  system  evaluation  environments.  The  principle  advantage 
of  the  embedded  secondary  task  technique  is  that  the  data  collected  are  generally  applicable  with  respect 
to  design  and  system  evaluations. 

Thu  other  secondary  tasks  offered  in  the  examples  are  possible  alternatives  that  may  be  applied  when 
the  embedded  secondary  task  technique  is  not  feasible.  However,  these  other  technique?  are  offered 
with  two  cautionary  notes.  First,  as  shown  in  Appancfix  A,  all  secondary  tasks  can  sometimes  intrude  on 
primary  task  performance.  This  possibiity  cannot  be  ruled  out  for  the  secondary  tasks  described  in  this 
review.  However,  the  secondary  tasks  recommended  here  are  ones  that  have  been  shown  to  minimize 
this  potential  oonfounding  in  most  situations.  A  second  consideration  for  these  alternative  secondary  task 
techniques  is  their  sensitivity  to  reflect  primary  task  demands  (i.e.,  workload).  That  is,  these  techniques 
may  not  always  be  applicable  for  a  particular  situation.  They  have  been  shown  to  be  sensitive  to  workload 
levels  in  complex  aircraft  systems,  but  have  not  been  fully  exercised  with  other  types  of  complex  systems 
of  interest  to  the  Army. 

The  recommendations  offered  should  not  be  interpreted  to  mean  that  other  operator  workload 
techniques  are  inappropriate  in  the  circumstances  described  (e.g.,  subjective  techniques),  indeed,  we 
recommend  that  secondary  task  techniques  be  utilzed  as  part  of  a  battery  of  both  subjective  scales  and 
other  empirical  methods,  in  this  way,  the  information  obtained  from  several  diverse  techniques  can 
compensate  for  limitations  in  each  individual  method.  Chapter  8  amplifies  upon  the  breadth  of  these  other 
techniques  that  are  also  appropriate  for  the  situations  described  above. 
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CHAPTER  7.  PHYSIOLOGICAL  TECHNIQUES 


Physiological  techniques  assess  the  operator's  workload  in  a  way  different  from  primary  and  subjective 
ilquos.  Primary  techniques  sample  directly  observable  responses  of  the  operator,  the  operator's 
>ra!  output  resulting  from  some  task.  Workload  (the  relative  capacity  to  respond)  Is  Inferred  from  the 
jr,  latency,  or  pattern  of  the  responses.  Subjective  workload  techniques  assess  the  judgments  of 
operator  about  workload.  These  judgments  am  related  to  and  directed  toward  such  factors  as 
tratiori,  difficulty,  time  pressures,  etc.  Workload  Is  Inferred  from  V*'  Pdgments.  By  contrast, 
sfological  techniques  assess  activities  which  ere  normatty  not  dire:  observable  and  represent 
:es  of  the  underlying  processes  Involved  In  responses.  | 

The  OWL  physiological  literature  has  a  well  enveloped  empirical,  statistical  and  mathematical 
mdation.  Further,  many  quite  different  techniques  have  been  us  d.  However,  many  authors  often 
sume,  apparently,  that  the  reader  understands  the  underlying  p  lysiology  and  the  authors  do  not 
tsent  the  thinking  and  physiological  rationale  behind  the  application.  When  this  happens,  the  various 
iniques  used  may  appear  to  be  an  apparent  'grab  bag*  of  techniques.  In  fact,  the  various  tephnlques 

imple  a  range  of  quite  different  physiological  systems  and  mechanisms.  By  and  large,  all  of  the 

\ 

jhnlques  are  based  on  sound  physiological  evidence,  however,  some  techniques  can  be  highly 
jcific  to  a  single  physiological  subsystem.  Accordingly,  It  win  be  helpful  to  the  reader  to  discuss  the 
lysiological  basis  and  delineate  not  only  the  rationale,  but  also  the  physiological  systems  and 
shanisms  being  measured. 

In  order  to  understand  the  application  and  the  results  obtained  from  such  techniques  It  Is  necessary  to 
te  each  technique  Into  an  appropriate  physiological  context.  First,  we  will  discuss  some  measurement 
jes  Including  the  theoretical  basis  for  each  measure  and  some  data  analysis  Issues.  Next,  we  win 
jss  briefly  some  basic  phyyldlopy  to  provide  a  frrrr rework  for  the  techniques  discussed  and  especially 
tat  thay  measure.  Then,  we  will  review  some  of  the  literature  on  tha^e  techniques  as  the  techniques 
apply  to  workload.  Physiological  measures  of  workload  have  been  recently  reviewed  (e.g.,  O'Donnell  & 
Eflgemeier,  irtSS)  and  our  Intent  Is  not  to  rerreet  the  Information  already  available  but  to  build  on  It. 

I  i 

Although  results  will  be  Included,  conan^reb’o  emphasis  will  be  oh  understanding  of  the  technlaue  and 
Its  application  and  usefulness  in  the  workload  conre  <t. 
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Mserjmment  Criteria 


""hare  is  a  major  difference  in  the  focus  of  physiological  research  and  workload  research  and  evaluation. 
The  physiologist  is  interested  In  mechanisms  and  the  functioning  of  physiological  subsystems. 
Accordingly,  the  study  of  a  physiological  subsystem  will  involve  discovering  its  full  range  of  operation, 
including  the  impact  of  extrema  conditions  not  normally  encountered  in  every  day  life.  By  contrast,  the 
workload  researcher  is  interested  in  relating  measurements  of  the  subsystem  under  more  normal 
circumstances  to  measures  of  human  behavior  and  to  workload.  As  an  example,  consider  pupil  diameter. 
The  full  range  of  pupil  function  is  from  about  1  mm  in  bright  light  to  sbeut  8  mm  in  total  darkness  and  this 
range  is  what  a  physiologist  would  be  interested  in.  Under  normal  circumstances,  those  which  an  operator 
<s  likely  to  experience,  pupil  diameter  changes  show  a  range  less  than  1  mm  (Beatty,  1982).  Clearty,  the 
range  of  operation  is  much  more  constrained  in  tlte  workload  context. 


When  one  uses  a  physiological  technique  to  assess  OWL  there  are  severai  types  of  questions  to 
answer. 


•  Is  this  physiological  technique  one  that  couid  reflect  workload  changes?  This  is  an 
issue  of  appropriateness.  Is  the  technique  appropriate  to  the  questions  being 
asked?  A  large  portion  of  the  inconsistency  about  a  technique  in  the  workload 
literature  may  be  related  to  this  very  point. 

•  Are  the  variations  of  the  physiological  technique  in  the  normal  operating  environment 
sufficient  to  produce  measurable  woikioad  variations?  This  is  the  issue  of  sensitivity. 
Are  the  techniques  sensitive  to  the  variations  in  OWL  the  operator  will  experience? 

•  Do  these  techniques  reflect  the  kinds  of  changes  in  the  human  operator  one  is 
interested  in?  This  is  dlagnosticlty  which  is  an  extension  of  sensitivity.  Are  the 
techniques  employed  sufficiently  specific  to  localize  the  difficulty  and  identify  the 
undertying  mechanism? 


In  one  form  or  another,  the  techniques  we  have  classified  under  physiological  are  those  which  are 
indirect  indicators  of  operator  workload  as  compared  with  primary  task  and  subjective  methods.  They  are 
presumed  to  be  reflective  of  the  amount  and  difficulty  of  the  work  the  operator  is  doing.  This  is  thought  to 
be  true  because  the  bodily  states  vary  as  a  function  of  what  one  is  doing:  waking,  sleeping,  running, 
sitting,  etc.  Similarly,  changes  in  bodily  state  and  especially  brain  states  can  be  measured  and  related  to 
these  activities.  It  is  the  hope  of  the  investigator  that  th*  bodily  functions  will  show  similar,  measurable 
changes,  albeit  smaller  for  workload  changes  than  what  the  physiology  studies  show. 
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Appropriateness 


Early  arousal  theory  assumed  that  all  motivation  and  emotion  Involve  the  same  basic  continuum  of 
physiological  activation  and  that  this  continuum  is  reflected  in  ail  of  the  psychoohyslological  techniques 
(e.g.,  Hebb,  1955;  Malmo,  1959).  Thus,  techniques  of  electroencephalogram  (EEG),  electromyography 
(EMG).  skin  conductance,  etc.  should  be  interchange abie  according  to  arousal  theory,  it  is  now  known 
that  such  a  simplified  view  is  wrong,  but  to  an  extent,  the  workload  literature  continues  to  reflect  this 
simplified  view.  Increasingly,  researchers  are  discovering  workload  to  be  multifaceted  and  thus  a  particular 
technique  may  reveal  workload  effects  in  one  case  but  not  another.  Stated  another  way,  the  use  of  an 
inappropriate  technique  may  be  misleading;  it  may  be  a  good  technique  in  some  instances,  but  it  was  the 
wrong  tool  for  the  question  at  hand. 

As  the  task  for  the  operator  changes,  the  most  appropriate  workload  technique  for  assessing  OWL  may 
also  change.  Many  physiological  techniques  have  been  applied  to  the  study  of  workload,  in  some  cases, 
the  authors  have  claimed  the  technique  to  have  relevance  for  workload  but  the  relevance  is  also 
contingent  on  the  definition  ot  workload.  Because  the  nature  of  operator  tasks  has  changed  rapidly, 
somo  of  the  older  techniques  are  more  reiated  to  fatigue  than  to  workload  as  defined  in  Chapter  2.  These 
have  been  discussed,  briefly.  The  appropriateness  of  the  technique  in  the  current  workload  context  will 
be  made  apparent  in  the  discussion  of  individual  techniques. 

Data  Analysis  Affects  Sensitivity  and  Dtagnosttctty 


There  are  several  important  implications  of  data  analysis  with  regard  to  sensitivity  and  diagnosticity. 
Since  mental  workload  is  dynamically  changing  over  time,  the  investigator  should  plan  the  study  and  the 
analysis  to  assess  the  timeline  of  operator  activity.  For  instance,  if  one  averages  over  time,  one  should  be 
sure  that  the  data  have  been  examined  first  for  consistent  trends  which  may  occur  with  time;  these  trends 
may  be  linear  or  nonlinear,  depending  on  the  circumstance.  More  will  be  said  about  trends  in  the  section 
on  heart  rate.  While  this  is  not  a  caveat  against  averaging,  it  Is  a  plea  tor  knowledgeable  and  careful 
averaging.  Unfortunately,  there  are  cases  in  the  literature  that  report  a  failure  to  find  an  effect  but  appear 
to  have  masked  the  effect  of  workload  through  averaging. 

A  fairly  simple,  preliminary  analysis  is  to  pick  some  (arbitrary)  short  time  period  as  a  window  within  which 
data  are  arranged.  A  computer  program  can  average  scores  within  each  window  to  produce  running 
averages.  This  permits  the  investigator  to  look  for  both  short  and  iong  term  trends  in  the  data.  The  data 
can  be  reanalyzed  using  several  variations  including  (a)  changing  the  size  of  the  tomporal  windows  or  (b) 
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using  temporally  overlapping  or  non-overlapping  windows.  A  considerable  amount  of  information  may  be 
gained  from  analyses  of  this  type;  specifically,  sensitivity  will  be  increased. 

Most  applied  workload  studies  involve  human  performance  over  a  period  of  time.  Time,  then,  is  an 
especially  important  factor  in  the  analysis  of  data  obtained  with  the  techniques.  Because  of  the  interaction 
between  and  the  counteracting  effects  of  the  sympathetic  and  parasympathetic  branches  of  the  nervous 
system,  setf-paced  taskc  are  difficult  to  analyze.  It  Is  necessary  to  have  time  marks,  either  recordeo  by  the 
experimenter  or  based  on  other  responses  of  the  operator,  to  separate  types  of  activities  into  meaningful 
categories.  Otherwise,  averaging  will  simply  mask  any  effects  of  Interest.  (See  Mulder  and  Mulder  [1387] 
for  further  discussion.)  Further,  diagnosticity  will  be  quite  low  If  there  is  no  way  to  relate  the  measured 
changes  to  ongoing  behavioral  activity. 

Given  that  a  technique  is  sensitive,  additional  procedures  can  be  employed  to  improve  diagnosticity. 
One  such  technique  involves  the  use  of  time  marks  recorded  during  data  collection.  These  marks  can  be 
based  on  external  events,  mission  milestones,  stimulus  presentations,  or  operator  responses.  Having 
recorded  some  type  of  mark,  the  analysis  can  be  locked  on  these  marks.  As  will  be  apparent,  such  an 
approach  to  data  collection  and  analysis  can  be  critical  for  both  sensitivity  and  diagnosticity. 

Some  analysis  techniques,  such  as  spectra!  analysis,  require  suitably  long  time  segments  to  do  the 
analysis.  It  the  experimenter  selects  or  samples  a  segment  which  is  too  short,  the  results  may  not  be 
stable  due  to  the  small  number  of  observations.  By  contrast,  if  too  tong  an  interval  is  selected,  some  of  the 
effects  may  be  masked.  Here  again,  averaging  and  poor  choice  of  the  temporal  scale  may  destroy 
potential  sensitivity  and  diagnosticity  of  a  physiological  technique. 

Physiological  Background 


Earlier,  physiological  techniques  were  referred  to  as  a  grab  bag  of  techniques.  To  organize  the 
techniques  and  to  provide  a  benchmark  for  our  evaluations  and  reoommendations,  we  will  provide  a  brief 
discussion  of  the  physiology  underlying  the  changes  that  various  techniques  are  supposed  to  measure. 
This  physiological  overview  *  ill  be  referred  to  in  the  course  of  discussing  workload  techniques.  It  will  also 
emphasize  a  clear  rejection  of  the  simplified  arousal  theory  view  that  ail  physiological  techniques  were 
created  equally  (e.g.,  Hebb,  1955).  To  facilitate  understanding  for  the  reader,  a  schematic  showing  the 
relations  ol  several  physiological  subsystems  is  shown  in  Figure  7-1 . 
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Figure  7-1.  Illustration  of  the  schematic  relations  among  various  physiological  systems.  The 
technique  associated  with  each  system  is  shown  in  a  box. 


The  central  nervous  system  (CNS)  consists  of  the  brain  and  the  spinal  cord.  The  CNS  is  actually 
composed  of  a  number  of  distinct,  identifiable  neurological  structures  which  are  somewhat  specialized  to 
perform  particular  functions.  Hence,  superimposed  on  this  structure  are  functional  systems.  For  example, 
the  language  system  is  composed  of  a  number  of  functional  carts:  hearing  and  analyzing  speech, 
mediating  or  understanding  the  speech,  a  vocabulary,  language  rules  and  the  organization  and 
generation  of  speech,  including  not  oniy  ordering  the  words  but  control  of  the  articulatory  mechanisms  to 
produce  the  sounds.  The  case  studies  of  brain  damaged  patients  demonstrate  the  existence  and 
separability  of  these  structures.  There  are  a  number  of  specialized  cortical  neurological  structures  which 
operate  in  uniscn  to  provide  the  capability  of  language  with  each  structure  contributing  to  the  functional 
system. 

The  electroencephalngrrm  (EEG)  and  evoked  cortical  potential  (ECP)  are  techniques  which  reflect 
activity  of  the  CNS  and  the  cortex  in  particular.  Electrodes  placed  on  the  scalp  over  particular  neurological 
structures  will  measure  very  small  changes  in  electrical  potential  occurring  in  the  brain.  Despite  the  use  of 
identical  recording  procedures,  the  composition  of  the  two  techniques  are  quite  different.  The  EEG 
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contains  a  number  of  waves,  some  of  which  reflect  general  arousal.  By  contrast,  the  idea  in  the  ECP  is  to 
average  out  these  arousal  waves  and  study  components  due  to  specific  stimuli.  To  continue  the 
language  example,  Chapman  (1979)  and  his  colleagues  presented  subjects  with  words  selected  along 
Osgood's  semantic  differential  dimensions  of  Evaluative  (good-bad),  Potency  (weak-strong)  and  Activity 
(fast-slow)  while  recording  evoked  potentials.  He  found  quite  different  wave  forms,  Implying  differential 
brain  functioning  depending  on  the  semantic  dimension  and  affective  meaning  of  the  words.  These 
results  illustrate  the  level  of  detail  that  can  be  examined  using  physiological  techniques. 

Other  techniques  such  as  heart  rate  and  pupil  diameter  are  a  function  of  the  peripheral  nervous 
system.  The  peripheral  nervous  system  consists  of  ail  nervous  cells  outside  the  CNS  including  those 
entering  and  leaving  the  brain  and  spinal  cord.  The  peripheral  system  is  divided  into  two  parts,  the  first  is 
the  somatic  nervous  system  which  includes  sensory  nerves  from  most  receptors  and  motor  nerves 
(effectors)  for  skeletal  muscles.  The  second  is  the  autonomic  system  which  includes  sensory  and  motor 
nerves  serving  the  heart,  glands,  arid  smooth  muscles.  Eye  movements  are  accomplished  by  three  pairs 
of  skeletal  muscles  (somatic  system)  while  pupil  dilation  is  under  control  of  smooth  muscles  (autonomic 
system).  Both  of  the  these  peripheral  nervous  subsystems  are  under  general  control  of  the  CNS.  It  is  the 
CNS  and  the  autonomic  system  that  are  of  principal  concern  for  the  measurement  of  workload. 

The  putonomic  nervous  system  underlies  emotional  and  motivational  behavior.  Any  feedback  system 
will  have  counter-acting  influences  and  the  autonomic  system  is  no  exception,  ft  is  divided  Into  two  parts 
which  act  in  opposition  to  each  other:  The  sympathetic  and  the  parasympathetic.  The  sympathetic  system 
activates  the  body  and  the  parasympathetic  serves  to  conserve  the  body.  To  illustrate,  there  is  clear 
physiological  evidence  that  stimulation  of  the  sympathetic  will  result  in  heart  rate  increases  and  pupil 
dilation  whereas  stimulation  of  the  parasympathetic  causes  decreases  in  heart  rate  and  pupil  constriction. 
Unrier  normal  operatio  s,  the  two  systems  balance  each  othar.  However,  an  emergency  may  cause  a  brief 
Imbalance  which  can  have  several  different  results  depending  on  the  timing  of  the  two  systems.  Fainting, 
for  example,  is  the  result  of  reduced  blood  flew  to  the  head  caused  by  activation  of  the  sympathetic 
system  followed  by  a  flood  of  activity  from  the  parasympathetic. 

Our  brief  discussion  of  some  highlights  of  physiological  function  clearly  shows  the  diversity  of 
information  available  from  the  various  rather  specialized  techniques  available  to  measure  physiological 
functions.  Although  one  could  say  ail  measure  CNS  activity  most  do  not  measure  the  activity  directly.  In 
the  case  of  body  fluids,  the  technic?  3  may  be  three  or  four  steps  removed  from  CNS  events.  There  are 
also  timing  differences,  neural  activity  is  quick,  chemistry  much  siowei.  Clearly  the  system  is  complicated 
with  lots  q;  antagonistic  activity  at  all  times.  This  implies  that  ptr/siological  techniques  may  reflect  particular 
changes.  However,  it  the  technique  does  not  show  a  change,  one  cannot  infer  high  workload  to  be 
absent.  Because  of  the  rapidity  of  neural  activity  and  the  counterbalancing  effects  of  the  antagonistic 
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systems,  timing  is  of  the  essence.  Failure  to  assess  a  physiological  change  associated  with  workload  can 
reflect  either  an  inappropriate  technique  or  ^Appropriate  data  analysis. 


Techniques  Measuring  Centac  Raaponaaa 

The  heart  is  influenced  by  the  autonomic  nervous  system  and  through  this  connection  the  heart  Is 
related  to  physical  and  emotional  states.  Heart  rate  is  known  to  be  related  to  the  amount  of  physical  activity 
(oxygen  requirements),  respiration,  and  thermal  regulation.  N  can  also  be  said  that  any  factor  which  affects 
mental  activities  will  also  affect  the  heart.  Thus,  mental  load  and  task  demands  wBI  affect  (and  can  be 
observed  in)  cardiac  response.  However,  so  can  other  normal  factors  such  as  the  orienting  response  and 
the  defense  response;  stressful  and  surprising  events  will  also  be  evidenced.  Additionally,  factors  such 
as  age  will  result  in  changes  in  heart  rate  variability  as  well  (Mulder  &  Mulder,  1967).  Consequently,  heart 
rate  is  a  function  of  a  number  oi  forces  which  may  be  operating  simultansously. 

HtvtRato 

Many  years  ago,  narrow  (1929)  reviewed  studies  which  seemed  to  show  that  looking  at  simple  stimuli 
seemed  to  cause  heart  rate  deceleration  while  stimuli  that  demanded  cognitive  processing  ware 
associated  with  acceleration.  Since  then,  many  studies  have  shown  attention  to  the  environment  to  be 
associated  with  heart  rate  deceleration.  Acceleration  to  less  dear,  but  certainly  there  to  a  relation  between 
heart  rate  and  the  skeletal  preparation  tor  movement.  Accordingly,  unless  the  investigator  to  very  careful 
to  separate  ail  of  these  influences,  the  increases  and  decreases  may  be  masked.  It  seems  modem 
investigators  have  had  more  difficulty  with  the  technique;  possibly  the  increased  use  of  computer 
technology  has  moved  tho  researcher  away  from  the  data 

There  is  some  controversy  with  the  OWL  implications  of  heart  rate  (O'Donnell  &  Eggemeier,  1966; 
WierwWe,  1979).  Not  all  investigators  have  found  consistent  results,  or  even  result  in  the  same  direction. 
Since  heart  rate  also  increases  with  physical  actMty,  one  must  take  care  when  measuring  mental  workload 
that  the  technique  to  not  contaminated  by  high  physical  actMty  condition*.  Roeooe  and  Qrleve  (1986)  and 
Wierwille  and  Connor  (1983)  have  independently  shown  that  the  technique  to  sensitive  to  high 
stress/workload  in  Milch  survival,  embarrassment  or  similar  emotions  ptsy  a  role.  Similarly,  long-term 
effects  seem  to  be  acknowledged.  Shartt  and  Salvendy  (1982)  in  discussing  oocinrettonal  stress,  state, 
The  heart  rate  measure  has  undoubtedly  been  proven  to  be  the  most  versatile  measure  of  stress,  pi  37* 
However,  some  invest  iga.or*  state  that  unless  strong  emotions  are  present,  heart  rate  win  not  covary  with 
workload  (Hart,  1986a). 


A  recent  report  by  Bauer,  Goldstein,  and  Stem  (1087)  provides  ft  departure  hi  procedure  from  other 
studies  and  also  illustrates  one  of  the  points  made  earlier  about  averaging  Au  Indicated  above,  some 
Investigators  have  failed  to  find  consistent  changes  tot  heart  rate  as  a  function  of  task.  Bauw  at  at.  (1987), 
using  the  Stumberg  task  as  a  secondary  task,  collected  a  multiple  set  of  measures  that  provides  an 
opportunity  to  compere  various  measurement  techniques.  For  data  analysis  on  heart  late,  they  divided 
each  trial  Into  18  time  bins  consisting  of  950  ms  each.  In  only  eight  out  of  the  18  time  bins  did  task  loading 
manipulation  have  a  significant  effect  on  heart  rate.  However,  heart  rate  Increased  and  then  decreased  &s 
a  function  of  ttone  into  the  trial.  These  three  intervals  reflect  different  nils  based  activities.  Averaging 
(which  Is  done  in  analysis  of  variance)  within  )ust  a  six  second  interval  to  compare  the  cue,  memory,  and 
test  Intervals  did  not  yield  a  significant  difference  for  the  three  intervals.  There  was  no  effect  of  task 
loading  but  a  dear  effect  of  invoking  different  undertone  processes.  It  is  of  note  that  their  evoked  cortical 
potential  measure  showed  an  effect  of  both  task  demands  and  task  loading.  While  heart  rate  did  not 
reflect  task  loading  very  dearly,  it  certainty  showed  dear  differences  related  to  what  the  subject  was  doing 
and  when,  that  fas,  the  Changing  task  demands. 

Maarf  fleto  VtrtabMUy  (Sinus  Anythmlz} 


Heart  rate  variability  (sinus  anyth mia)  is  another  workload  measure  relevant  to  heart  rata  date,  it  has 
proven  to  be  equally  controversial  (O’Donnell  A  Eggemeler,  1986,  Wlerwitle,  1979).  Some  of  the 
inconsistency  may  be  due  to  quite  different  anatjfste  techniques;  Kalsbeek  ([1973],  cited  r  y  O'Donnell  & 
Eggemeier,  1986)  has  reported  more  than  30  tedrtques  which  have  been  used  to  determine  variability. 

Why  look  at  variability?  Simply  on  logical  grounds  one  would  expect  an  Increase  in  head  rate  to  bo 
associated  with  a  decrease  in  variably;  after  al  there  is  an  upper  fimK  cst  heart  rate.  As  It  turn*  cut,  there  is 
a  negative  correlation  (about  -.40)  between  heart  late  increases  and  heart  rate  variability.  Even  though 
there  to  a  relation  between  the  two,  the  fact  that  the  correlation  to  modest  indicates  the  two  measures 
reflect  somewhat  different  aspects  of  the  physttogkal  activity. 

The  spectral  analysis  of  heart  rats  variability  provides  a  method  to  separate  out  several  frequency 
components  stemming  from  different  sources  and  i  tee  ms  to  show  premise  as  a  measure.  One  peuk, 
found  around  0.35  Hz,  represents  respiration  and  a  second  peak  reported  at  0  20  Hz  represents  heart 
activity  related  to  formal  aspects  (Sayers,  1973).  Some  investigators  suggest  a  thermal  energy  band 
from  .02  to  06  Hz;  arterial  pressure  from  .07  to  .14  Hz;  and  respiratory  activity  from  15  to  .50  Hz  (Aasman. 
Mulder,  t  Mulder,  1987).  For  our  purposes  the  Important  peak,  found  around  0.10  Hz,  Is  related  to  blood 
pressure  and  seems  to  be  correlated  with  workload  (Sayers,  1973) 
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The  early  work  of  Sayers  has  been  foltovred  by  an  Increasing  number  of  studies  which  show  the  .10  Hz 
component  to  be  an  effective  indicator  of  mental  activity,  but  care  must  be  taken  to  factor  out  all 
confounding  variables.  Aasman  «t  al.  (1987)  found  a  significant  affect  of  task  loading  (2  or  4  Hems)  in  a 
continuoue  memory  paradigm;  the  amplitude  of  the  .1C  Hz  component  decreased  as  the  toad  on  memory 
increased.  These  investigators  attribute  the  change  to  the  amount  ot  effort  expended,  distinguishing 
between  mental  effort  and  mental  workload  (in  our  terminology  different  rules).  Workload  refers  to 
dimensions  of  the  perceived  task  demands  and  use  of  resources.  Effort  refers  to  what  the  subject  is 
doing,  the  willingness  to  expend  effort  in  the  utilization  of  the  resources.  Overload,  pushing  the  operator 
outside  tho  workload  envelope,  however,  results  in  a  cessation  of  effort  and  a  corresponding  increase  in 
the  .10  Hz  component. 

Vincente,  Thornton,  and  Moray  (1987),  in  another  recent  study,  used  three  levels  of  difficulty  on  a 
tracking  task  and  had  subjects  give  subjective  ratings  of  difficulty,  workload,  and  effort.  Effort  was  defined 
as  the  amount  of  attenttonal  demand;  difficulty  was  defined  as  how  hard  the  motor  task  was;  and  workload 
was  defined  as  the  overall  level  of  demand  on  the  task.  These  investigators  did  not  find  an  effect  of  task 
difficulty  on  the  .10  Hz  component  and  only  a  marginal  effect  on  the  subjective  estimates  of  effort. 
However,  they  found  a  correlation  (.66)  between  the  .1 0  Hz  component  and  ttie  subjects  estimate  of 
effort.  Actually,  seven  out  of  eight  subjects  showed  the  correlation,  the  eighth  did  not.  Of  interest  to  the 
comments  about  data  analysis,  the  tracking  task  is  continuous.  Timing  of  performance  and  trie  size  of  time 
samples  used  in  the  analysis  are  important  (Mulder  &  Mulder,  1987). 

Summery  of  Heart  Meoaums 


Both  mean  heart  rate  analysis  and  heart  rate  variability  (spectre*  analysis)  are  based  on  measures  of 
heart  rate.  Accordingly,  one  has  a  unique  opportunity  to  extract  two  measures  from  a  stogie  technique. 
Mean  heart  rate  as  indicated  by  the  older  literature  and  some  recent  studies,  shorn  measurable  changes 
as  a  function  o J  task  difficulty,  quite  possibly  due  to  a  generalized  arousal  component.  Heart  rate  variability 
(.10  Hz  component)  often  appears  to  bo  sensitive  to  task  loading  and  fatigue  (Egelund,  1962;  Strasser, 
1981).  The  majority  of  dudies  reported  in  the  literature  are  Hooking  for  relatively  subtle  effects,  in  practical 
application,  there  may  be  some  situations  in  which  unknown  but  extreme  demands  may  be  made  on  the 
operator;  heart  measures  would  detect  such  situations. 

Overall,  one  is  not  Impressed  with  the  consistency  of  the  results  using  heart  measures.  This  has  lead 
O’Donneli  and  Eggemeier  (1986)  to  conclude  "For  the  present,  therefore,  heart  rate  and  heart  rate 
variability  must  be  considered  an  attractive  end  promising  but  unvaltoatec!  measure  of  workload  p  42-42." 
However,  the  European  research  groups  (e.g.,  Mulder  &  Mulder,  1907;  Strasser,  1981)  have  had 
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reasonable  success  when  tie  various  confounding  farters  Iwtvo  been  taken  Into  account.  Thus,  Irt  seme 
applied  situations,  heart  techniques  may  be  appropriate. 

Tichrdquits  lor  Measuring  the  Eye 

Three  separate  visual  structures  are  of  interest  ir  the  context  of  OWL.  These  are 

*  The  movements  of  the  oye  which  are  controlled  by  three  pairs  of  musetos  (horizontal, 
vertical  and  rotational  movements!  under  control  of  the  eye  movement  system, 

*  Thg  pupil  which  is  ocwroSled  through  the  autonomic  system,  and 

*  The  lids  of  the  eye  wi  itch  are  under  control  of  the  somatic  system. 

Eym  Movstmnt*  and  Scanning  Point  ot  mg*fd) 

Eye  movements  occupy  a  unique  role  in  information  acquisition.  Because  of  the  central  role  of  vision 
and  eye  movements  In  information  acquisition,  many  investigators  have  focused  on  information 
acquisition  strategies  reflected  in  eye  scanning  patterns  to  identify  the  source  of  Information  for  decisions. 

The  goal  of  applied  eye  movement  research  has  been  to  determine  the  scan  p.  <ttems,  how  and  where 
an  operator  gets  information  and  in  turn  what  hi  does  with  it.  An  assumption  normally  made  is  that  dwell 
time  (the  length  of  a  look)  serves  as  an  index  of  visual  workload:  The  fonger  the  dwell  time,  the  more 
difficult  to  read  the  instrument.  Current  eye  movement  technology  permits  the  investigator  not  only  to 
monitor  movement  of  the  eyas  but,  with  appropriate  calibration,  determine  the  point  of  regard,  i.e.,  what 
was  looked  at. 

In  1903,  Dodge  used  film  to  record  a  reflected  image  of  the  eye  which  is  still  a  useful  technique.  Since 
the  time  of  Ccdge,  a  number  of  techniques  have  been  developed.  (See  O'Dorineil  and  Eggemeier 
(1966)  for  a  review  of  those  various  techniques  and  Haliett  [1 986]  for  a  thorougn  review  of  eye  movement 
research.)  While  each  of  these  techniques  can  serve  a  useful  research  function,  few  are  useful  in  an 
applied  context.  Helmet  mounted  cameras  filming  the  eye,  much  a &  Dodge  did,  have  also  proven  to 
provide  useful  diagnostic  information  for  OWL  (WUson,  O'Donnell  &  Wilson,  1983). 

Research  shows  workload  can  be  predicted  from  changes  !n  dwell  times.  These  workload  changes 
result  from,  for  example,  the  difficulty  of  reading  an  instrument  (Harris  &  Glover,  1984}  or  a  change  in  mode 
of  flying,  autopilot  or  manual  (Spatfy,  1978b).  Waller  (1976)  shewed  eye  movement  data  could  be  used  to 
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predict  Cooper-Harper  ratings,  thus  broadening  the  application  of  eye  movement  techniques  for  OWL 
estimation. 

Diagnosticity  of  eye  movement  measures  can  be  excellent.  Wilson  et  al.  (1983)  were  able  to  diagnose 
what  a  pitot  was  doing  even  when  they  could  not  obtain  good  evoked  potential  responses.  The  eye 
movement  technique  measures  visual  workload,  but  manipulations  outside  vision  may  increase  general 
workload  which  can  have  an  effect  on  dwell  times. 

Some  eye  movement  devices  can  be  expensive  and  require  substantial  data  analysis  capability, 
although  many  analysis  techniques  have  been  worked  out  (Harris,  et  al.,  1955).  Its  potential  as  a  workload 
analysis  technique  is  highest  among  the  techniques  classified  as  physiological,  however,  it  may  not  be 
practical  at  the  present  time  for  most  Army  applications.  What  can  be  of  considerable  use  and  much  less 
costly,  even  though  more  obtrusive,  is  the  helmet  mounted  camera  -  the  Dodge  technique. 

Pupil  Diameter- Pupil  Dilation 


It  is  well  known  that  pupil  diameter  varies  with  a  number  of  physiological  and  psychological  variables. 
Beatty  (1982)  has  reviewed  the  literature  and  concluded  that  the  task-evoked  pupil  response  reflects 
processing  loads.  In  the  context  of  our  terminology,  it  appears  to  be  sensitive  to  both  rule  changes  and 
task  loading.  For  example,  pupil  diameter  changed  both  as  a  function  of  the  phase  of  task  (listen,  pause, 
report)  as  well  as  the  memory  toad  (3  to  7  digits)  (Beatty,  1982).  Similarly,  the  measure  has  been  shown  to 
be  sensitive  to  difficulty  of  tasks  such  as  sentence  comprehension  and  visual  tasks  involving  comparison 
of  letter  pairs. 

At  present,  because  of  the  stringent  restrictions  on  operator  movement,  the  field  application  of  pupil 
diameter  measurement  is  minimal.  To  obtain  good,  accurate  recordings,  eye  movement  must  be  kept  to  a 
minimum;  when  the  eye  is  at  an  oblique  angle  to  the  recording  device,  the  two  dimensional  image  of  the 
pupil  will  be  attenuated  due  to  the  geometry.  Further,  because  the  pupil  varies  with  light  levels 
independently  of  workload  states,  one  must  be  careful  to  keep  ambient  light  at  a  constant  to  avoid 
contamination  of  the  data.  It  appears  possble  to  remove  both  of  those  effects  analytically,  but  this  has  not 
been  done. 

Blink  Rate  and  Latency 


Although  blinking  is  subsumed  under  eye  measures,  the  somatic  motor  pathway  of  the  eyelids  may  be 
somewhat  different  from  the  motor  pathway  of  the  saccadic  eye  movement  (Moses,  1970).  There  are  two 
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type*  0}  blinking,  reflax  and  spontaneous.  The  spontaneous  blink  is  of  Interest  In  the  study  of  workload 
and  can  be  conditioned.  The  duration  of  a  full  blink  is  about  .3  to  .4  seconds  and  occurs  normally  at  the 
rate  of  about  2.8  seconds  In  men  and  4.0  seconds  In  women.  Blink  rate  of  the  eye  has  been  measured 
using  electromyography  (EMG),  sometimes  called  electro-ocutomyography  (EOG)  when  applied  to  eye 
research. 

in  the  6tudy  discussed  earlier  using  the  Sternberg  paradigm,  Bauer  et  al.  (1967)  also  measured 
blinking.  Their  analysis  was  parallel  to  that  for  heart  rate  and  Included  three  measures:  blink  rate,  blink 
latency,  and  blink  duration.  For  data  analysis  they  divided  each  trial  into  16  time  bins  consisting  of  950  ms 
each.  These  bins  were  aisc,  blocked  into  intervals  consisting  of  six  bins  each.  Blink  duration  showed  a 
decease  over  bins  and  an  increase  over  intervals.  Blinks  occurred  (biink  latency)  earlier  following  the  cue 
than  for  other  stimuli.  For  the  blink  rate  analysis,  the  bin  effect  was  significant;  blink  rate  declined  as  an 
increasing  function  of  time  dint*  (tie  stimulus  presentation.  Of  note,  the  blink  rate  declined  from  one 
every  two  seconds  in  the  first  bin  to  one  every  six  seconds  in  the  Last  bin.  In  eight  out  ot  the  18  time  bins, 
the  set  size  task  loading  had  a  significant  effect  on  rate  and  overall  set  size  had  a  statistically  significant 
effect.  Blink  rate  provides  a  measure  directly  related  to  task  demands  and  to  task  loading. 

Summary  of  tye  Technique* 

Because  vision  is  a  major  information  acquisition  sensory  system,  many  investigators  have  focused 
efforts  on  determining  how  the  system  functions  and  acquires  information  under  varying  workload 
conditions.  This  has  primarily  focused  on  determining  the  point  of  gaze  or  look  point  of  the  eye.  The 
three  techniques  considered  in  this  section  cover  three  different  aspects  of  the  nervous  system.  Of  the 
three,  the  eye  movement  /  point  of  gaze  technique  is  probably  the  nost  important.  The  data  derived  from 
studies  of  eye  movements  have  application  in  a  number  of  workload  situations:  not  only  for  instrument 
panels  and  computer  displays  but  also  visual  search  patterns  used  to  detect  events  and  targets.  The  cost 
and  etfort  required  to  obtain  and  analyze  eye  movement  data  reduce  the  practical  applicability  oi  these 
techniques. 

Pupil  diameter  has  oeen  shown  to  be  sensitive  to  workload  variations,  especially  the  amount  of  mental 
load.  However,  measurement  techniques  do  not  lend  themselves  to  field  situations.  These  restrictions 
limit  the  technique  to  the  laboratory.  Blink  rate  is  a  technkjue  that  has  received  less  attention,  but  It  could 
be  useful,  especially  in  conjunction  with  other  measures  of  eye  behavior. 
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TtchnlquM  toe  MaNuranwnt  of  Brain  Activity 


Within  the  past  10  yaars,  a  considerable  amount  of  crffoit  has  been  devoted  to  Identifying  measures  of 
brain  activity  that  are  reflective  of  underlying  psychological  processes  that  Influence  human  Informatlon- 
procttssing  and  performance  (Donchln,  Ritter,  &  MacCafium,  1S78;  Hillyard  &  Kutas,  1983;  Posner,  1978). 
Such  efforts  have  offered  promising  results  with  respect  to  Identifying  brain  activity  patterns  related  to 
operator  workload  (e.g.,  Kramer  et  el.,  1987).  With  respect  to  understanding  human  information 
processing  and  performance,  researchers  have  recognized  that  measures  of  brain  activity  (e.g.,  cortical 
evoked  potentials)  are  complex  and  their  recording  and  analysis  costly.  Therefore,  one  is  not  likely  to  use 
these  measures  except  when  they  provide  data  not  easily  available  with  more  tiaditlonal  behavioral 
measures  (DuncarKfohnson  &  Pooch  In,  1982).  A  somewhat  similar  note  of  caution  has  been  offered  In 
regard  to  utilizing  brain  activity  measures  (o.g.,  cortical  evoked  potentials)  as  indices  of  mental  workload 
(Kramer  et  at.,  1987). 

FtetrtMncophalogmm  (EEG):  Sptctnl  Analysis 

The  electroencephalogram  (EEG)  is  typically  recorded  from  surface  electrodes  placed  directly  on  the 
scalp.  Such  recordings  can  provide  data  on  the  brain's  electrical  activity  during  fho  performance  of  a  task. 
Attempts  have  been  made  to  quantify  this  electrical  activity  aocording  to  the  predominant  spectral 
frequencies  that  make  up  such  brainwave  activity.  The  premise  is  to  identify  those  spectral  frequency 
bands  that  are  indicative  of  and  reflect  changes  in  workload.  The  EEG  frequency  bands  that  have 
received  the  most  attention  are  4-7  Hz  (Theta),  8-12  Hz  (Alpha),  and  18-30  Hz  (Beta). 

In  general,  the  findings  support  the  conclusion  that  the  percentage  of  low  frequency  EEG  spectra! 
bands  (i.e.,  Alpha)  increases  during  the  course  of  prolonged  and  continuous  performance  (Parasuraman, 
1984).  Such  findings  have  been  interpreted  as  indicative  of  lowered  arousal  levels  overtime  (Gale,  1977, 
O'Hanlon  &  Beatty,  1977).  As  a  result,  EEG  spectral  changes  have  been  seen  as  reflecting  general  state 
changes  within  an  individual  (e.g.,  drowsy,  alert).  However,  the  relationship  between  these  general  state 
changes  as  shown  by  EEG  spectral  analysis  and  operator  workload  as  indexed  by  performance  changes  is 
not  always  clear.  For  example,  Gale,  Davies  and  Smalbone  (1977)  used  a  simulated  radar  type  task  to 
show  that  subjects'  performance  declined  as  measured  by  reaction  time  (RT  increased)  during  the  course 
of  prolonged  performance  which  was  accompanied  by  corresponding  increases  in  the  amount  of  the  7.5- 
9.5  Hz  EEG  spectral  band  (i.e.,  decrement  in  physiological  arousal).  In  contrast,  similar  changes  in  the 
EEG  have  been  reported  by  Fruhstofer  and  Bergstrom  (1969)  when  subjects  were  relaxed  and  performed 
no  task  for  a  comparable  period  of  time.  The  EEG  spectral  analysis  approach  has  therefore  been  seen  as 
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Indicative  oi  organlsmic  states  which  may  or  may  not  interact  with  or  reflect  workload  (eg.,  fatlguo, 
boredom). 

To  Illustrate  this  point,  HowKt,  Hay,  SheipokJ  and  Ferros  (1978)  examined  EEG  changes  for  a  single 
pilot  during  actual  flights  In  a  small  two-engine  transport  aircraft.  The  flights  were  performed  either  as  the 
first  flight  of  the  day,  after  a  night  of  sleep  deprivation,  or  after  a  series  ot  daytime  flights  to  assess 
sustained  performance  over  the  course  ot  a  workday.  Each  flight  was  considered  to  contain  segments  ol 
differing  levels  of  workload  (e.g.,  single  engine  takeoff  vs.  maintaining  steady  level  flight).  Results 
showed  a  decrease  in  amplitude  of  EEQ  activity  across  several  spectral  bands  (o.g.,  8-12  Hz  and  12-16 
Hz)  whon  sleep  deprived  flights  or  end  of  the  day  flights  were  compared  to  first  of  the  day  flights.  These 
EEG  changes  were  seen  as  reflecting  organlsmic  changes  resulting  from  sleep  deprivation  (e.g., 
sleepiness)  or  prolonged  work  (e.g.,  fatigue).  However,  when  comparisons  were  mode  to  In-flight 
segments  of  different  workload  levels  only  the  first  day  flights  showed  evidence  in  the  REG  for  reflecting 
workload  levels  (e.g.,  increased  EEG  amplitude  for  3pectral  bands)  with  concomitant  Ir.zeaoe  in  workload 
activity.  By  contrast,  sleep  deprived  flights  end  end  ot  day  flights  showed  no  signs  In  'ne  EEG  that  wero 
reflective  of  changes  in  workload  levels  during  these  fHglrts.  ? 

Summary.  EEG  spectral  analysis  seems  to  offer  means  to  assess  changes  n  organlsmic  states 
within  an  operator  (e  g.,  fatigue,  sleepiness)  that  may  or  may  rot  show  in  performance.  As  a  direct 
measure  of  workload,  EEG  spectral  analysis  Is  not  an  advantageous  technique.  Other  researchers  have 
voiced  similar  opinions  (e.g.,  O’Donnell  &  Eggemeier,  1986). 

Evoked  Cortical  Potentiate  (ECPr) 


Usually,  brain  wave  activity  as  measured  by  electroencephalography  (EEG)  reveals  little  in  the  way  of 
discriminable  patterns  that  ran  be  attrfouted  to  operator  workbad.  However,  signal  analysis  techniques 
can  be  utilized  to  isolate  specific  brain  wave  patterns  that  are  responses  to  external  stimuli  and  may  be 
used  to  reflect  operator  workload  levels  (e.g.,  Isreal,  Wlckens,  Chesney  &  Donchin.  1980).  These  brain 
wave  patterns  found  in  response  to  external  stimuli  are  called  Evoked  Cortical  Potentials  (ECPs)  or  Event- 
Related  Potentials  (ERPs). 

The  value  of  the  ECP  is  based  on  the  concept  that  brain  waves  reflect  a  combination  of  human  sensory 
inputs  (e.g.,  external  stimuli/events)  and  cognitive  processing  (e.g.,  evaluating  external  stimuli/events). 
For  example,  •vhen  a  stimulus  is  presented  to  the  operator,  a  portion  of  the  brain  wave  activity  Is  a 
response  associated  with  that  stimulus.  The  remaining  brain  wave  activity  is  considered  as  ongoing, 
unsynchronized,  spontaneous  activity  that  Is  not  necessarily  associated  to  the  processing  of  such  stimuli. 
By  performing  ensemble  averaging  across  the  time  intervals  following  the  multiple  presentations  of  the 
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stimulus,  the  ECP  associated  with  such  stimuli  will  be  enhanced  through  this  averaging  while  the 
spontaneous  brain  activity  occurring  in  these  time  Intervals  will  be  cancelled  out.  Figure  7  2  depicts  the 
relation  between  ongoing  EEG  activity,  external  auditory  stimuli  and  signal  analysis  techniques  used  to 
extract  the  ECP  associated  with  such  stimuli. 

ECP  Component*.  As  seen  in  Figure  7-2,  the  ECP  Is  a  complex  wave  form,  it  exhibits  several 
components  that  are  identified  as  either  negative  (N)  or  positive  (P)  peaks.  In  addition,  these  negative  and 
positive  components  are  further  identified  by  their  time  course  as  measured  from  the  external  eliciting 
stimulus  onset  to  their  mean  latency  of  occurrence  fe.g.,  the  P300  is  a  positive  waveform  component 
occurring  at  approximately  300  msec,  after  stimulus  onset). 


Figure  7-2.  Depiction  of  the  relations  between  EEG  activity,  external  auditory  stimuli,  and  signal  analysis 
techniques  used  to  extract  the  ERP  associated  with  such  stimuli.  (Adapted  from  Hiltyard  &  Kutas  [1983]). 

The  early  occurring  components  of  the  ECP,  less  than  250  ms  from  the  onset  of  the  external  stimulus, 
have  been  characterized  as  being  responsive  to  the  physical  nature  of  the  external  stimuli  used  to 
generate  the  ECP.  For  example,  visual  stimuli  have  elicited  ECPs  with  identifiable  early  components  that 
seem  sensitive  to  manipulations  of  the  physical  parameters  of  such  stimuli  with  respect  to  brightness 
(P200;  Wastell  &  Kleinman,  1980),  spatial  orientation  (N125;  Harter,  Previc,  &  Towle,  1979)  and  contour 
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(N1C0-235;  Harter  8>  Guido,  1980).  Such  KCP  components  are  classified  as  oxogenows  (i.e.,  stimulus 
bound)  since  they  are  sensitive  to  the  physical  attributes  of  the  stimuli  (i.e.,  intensity,  modality,  ard  rate  of 
presentation). 

The  later  components  of  the  ECP,  those  beyond  ?.50  msec,  from  the  onset  of  the  external  eliciting 
stimulus,  are  considored  to  reflect  active  cognitive  processing  of  stimulus  information.  These  ECP 
components  seem  to  be  sensitive  to  changes  In  the  processing  demands  of  the  task  Imposed  on  the 
operator  but  not  to  changes  in  the  physical  characteristics  of  the  eiidf'ng  external  stimuli  (Sutton,  Braren, 
Zubin,  &  John,  1965).  Those  later  occurring  FCP  components  have  been  classified  as  endogenous 
components.  The  ECP  endogenous  component  that  has  received  fn©  greatest  attention  is  the  positive 
waveform  occurring  approximately  333  msec,  after  the  external  eliciting  stimulus  onset  (P300).  T  he  P300 
has  bean  examined  as  a  measure  to  reflect  cognitive  processing  activities  as  well  as  a  measure  to  reflect 
workload  levels  (See  Pritchard,  1981  for  a  comprehensive  review  of  the  P3C0  literature.) 

P300  and  Cognitive  Processing.  The  P300  waveform  exhibits  systematic  changes  in  latency  and 
amplitude  that  are  used  as  evidence  for  its  sensitivity  to  aspects  of  human  information  processing.  In 
general,  the  P300  amplitude  seer.s  to  be  sensitive  to  the  task  relevance  and  the  subjective  probability  of 
the  eliciting  external  stimuli  (Duncan-Johnscn  &  Donchin,  1977).  For  example,  the  P300s  elicited  by  task 
relevant  stimuli  are  larger  in  amplitude  than  the  P300s  elicited  by  stimuli  not  relevant  for  the  task  to  be 
performed  (Roth,  Ford,  &  Kopell,  1978). 

The  P300  latency  appears  sensitive  to  the  time  required  to  recognize  and  evaluate  task  relevant  stimuli 
(Kutas,  McCarthy  &  Donchin,  1977).  That  is,  the  P300  latency  reflects  stimulus  evaluation  time  in  the 
sense  that  identification  and  evaluation  of  a  stimulus  must  be  completed  before  the  P300  is  observed 
(Pritchard,  1981).  This  relationship  between  P3G0  latency  and  stimulus  evaluation  has  been 
demonstrated  to  be  independent  of  response  selection  and  execution  process.  McCarthy  and  Donchin 
(1981)  manipulated  stimulus  evaluation  time  by  embedding  a  target  word  (P300  eliciting  event)  either  in  a 
matrix  of  #  signs  O'*  within  a  confusible  background  of  letters  Response  selection  was  manipulated  by 
changing  the  compatibility  between  the  target  word  (right  or  left)  and  the  responding  hand.  It  was  found 
that  both  visually  distracting  stimuli  and  stimulus-response  incompatibility  increased  reaction  time  to  the 
target  words.  Only  the  presence  of  the  distracting  stimulus  backgrounds  (letter  backgrounds)  had  a 
significant  effect  on  P500  latency  (i.e.,  more  evaluation  time  was  needed  to  identify  target  words). 

F3QQ  and  OWL  The  findings  just  cited  provide  evidence  that  the  P300  Is  sensitive  to  aspects  of 
cognitive  processing.  Further,  the  relevance  of  P300  measure?  (amplitude  anrt  latency)  to  operator 
workload  has  been  demonstrated  in  a  series  of  studies  conducted  at  the  Cognitive  Psychophysiology 
Laboratory  at  the  University  of  Illinois.  For  example,  Isreal,  Wickens,  Chesney  and  Donchin  (1980) 
examined  a  display-monitoring  task  in  which  operators  monitored  4  to  8  targets  that  moved  across  a 
television  screen.  Half  of  the  targets  were  square-shaped  objects  tnd  half  were  triangular-shaped 
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objects.  Operators  ware  required  to  monitor  one  class  of  targets  (squares  or  triangles)  and  to  detect 
changes  in  either  direction  of  movement  or  brightness.  A  secondary  task  was  aiso  required  of  operators. 
Operators  were  required  to  listen  for  high  arid  low  frequency  tones  that  were  presented  during  the 
performance  of  the  display-monitoring  task.  They  were  Instructed  to  count  to  themselves  the  number  01 
times  tiit  high  pitch  tones  (lower  probability  of  occurrence  than  the  low-pitch  tones)  were  presented 
cfuiing  the  course  of  trial  runs.  They  were  told  to  report  this  number  at  the  and  of  the  triai-run.  The  P300 
elicited  to  the  rarer  of  the  two  auditory  tones  was  used  as  a  measure  of  operator  workload  levels.  The 
concept  behind  such  a  measurement  scheme  is  that  the  primary  task  will  occupy  operators'  perceptual 
resources  as  a  function  of  the  primary  task  demands.  More  perceptual  resources  are  needed  to  monitor  8 
moving  targets  than  4  moving  targets.  As  a  result  of  this  manipulation,  the  available  perceptual  resources 
needed  to  detect  high  frequency  tones  under  high  primary  task  demand  will  be  less  than  under  low 
primaiy  task  demand  and  therefore  will  be  reflected  In  the  P300s  to  such  tones.  The  lesults  of  the  study 
supported  such  a  measurement  scheme.  The  P300  elicited  under  control  conditions  (no  primary  task, 
counting  of  tones  only)  was  highest  in  amplitude.  This  was  followed  next  by  the  low  perceptual  demand 
condition  (4  targets  to  monitor).  Finally,  the  high  perceptual  demand  condition  (8  targets  to  monitor)  was 
lowest  in  P300  amplitude.  The  conclusion  to  be  drawn  is  that  the  P300  seems  sensitive  to  perceptual 
task  demands  (i.e.,  workload). 

The  use  of  the  relatively  non  intrusive  secondary  task  just  described  (auditory  monitoring  to  detect 
infrequent  occurring  tones)  has  been  called  the  oddball  paradigm  (Donchin,  1981).  The  oddball 
paradigm  has  been  employed  in  several  workload  studies  with  similar  results  being  reported .  For  example, 
Isreal,  Chesney,  Wickens,  and  Donchin  (1S80)  and  Wickens,  Kramer,  Vanasse,  and  Donchin  (1983)  have 
reported  the  P300  amplitude  elicited  by  the  secondary  task  (oddball  paradigm)  decreased  as  the 
perceptual  task  demands  of  the  primary  task  Increased. 

Further  evidence  in  support  of  P300  sensitivity  to  OWL  has  been  reported  by  Wickens,  at  al.  (1983). 
They  were  able  to  demonstrate  with  a  primary  tracking  task  in  which  discrete  displacements  of  the  tracking 
cursor  served  as  the  eliciting  stimulus.  The  P300s  associated  with  such  a  primary  task,  as  contrasted  with 
the  secondary  task,  increased  in  amplitude  as  the  perceptual  demands  of  the  task  increased  (i.e.,  operator 
workload).  Kramer,  Wickens  and  Donchin  (1985)  have  reported  similar  results  with  P300s  elicited  by  a 
primary  tracking  task. 

The  evidence  presented  in  support  of  the  P300  as  a  measure  of  OWL  in  this  review  has  been  confined 
to  controlled  laboratory  situations.  However,  there  have  been  attempts  to  record  ECP  with  the  use  of  the 
oddball  paradigm  in  simulation  type  environments.  Kramer  et  ai.  (1837)  elicited  P300s  by  means  of  the 
oddball  paradigm  while  student  pilots  flew  a  series  of  instrument  (light  rule  missions  in  a  single-engine, 
fixed-based  simulator.  The  flights  varied  In  difficulty.  The  P300  amplitude  discriminated  between  flights 
such  that  the  more  difficult  flight  mission  elicited  P300s  lower  in  amplitude  for  the  secondary  task  than  the 
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easier  one.  However,  withln-flight  primary  task  demands  were  not  distinguishable  by  the  P300  amplitude; 
for  example,  takeoff,  straight  and  level  flight,  holding  pattern  and  landing.  Natani  and  Gomer  (1981)  have 
repotted  similar  success  in  using  the  oddball  paradigm  to  elicit  P3Q0s  with  a  low-fidelity  flight  simulation 
such  that  P300  amplitudes  varied  as  a  function  of  workload  levels. 

The  P300  latency  measure  provides  an  accurate  and  reliable  means  to  assess  the  time  needed  to 
identify  and  evaluate  a  stimulus  prior  to  making  a  response,  in  addition,  the  P300  latency  seems  to  be 
independent  of  response  selection  and  execution  processes  (McCarthy  &  Donchin,  1981).  As  a  result, 
the  P300  latency  can  be  used  to  determine  the  locus  of  performance  changes  that  may  occur.  That  is,  if 
P300  latencies  vary  systematically  with  performance  changes,  one  may  conclude  that  identification  and 
evaluation  ot  stimuli  are  contributing  significantly  to  performance  changes  such  as  increased  reaction 
times.  However,  If  P300  latencies  remain  invariant  and  stable  to  performance  changes,  such  changes  are 
not  likely  due  to  identification  and  evaluation  processing.  To  illustrate,  Gomer,  Spicuzza  and  O'Donnell 
(1976)  reported  a  study  in  which  subjects  performed  the  Sternberg  memory-scanning  task  (Sternberg, 
1969).  Subjects  were  presented  with  probe  letters  of  the  alphabet  (ECPs  were  elicited  from  these  stimuli) 
and  were  asked  to  identify  If  the  probes  were  members  of  a  previously  memorized  positive  set  of  letters. 
Memory  load  was  manipulated  by  changing  the  number  of  letters  In  the  memorized  set.  Both  reaction  time 
and  P300  latency  increased  linearly  as  a  function  of  memory  set  size  for  positive  probe  items.  Such 
results  support  the  inference  that  stimulus  evaluation  time  (i.e ,  memory  scanning)  contributes  greatly  to 
reaction  time  scores  in  the  Sternberg  paradigm.  In  contrast,  Duncari-Johnson  and  Kopell  (1981)  found 
that  the  Stroop  effect  (i.e.,  people  respond  slower  to  color  words  printed  in  a  different  color  than  the  same 
color,  e.g.,  blue  printed  In  the  color  red  as  opposed  to  blue  printed  in  the  color  uiua)  was  mainly  due  to 
response  incompatibility  rather  than  perceptual  Interference  (I.e.,  prolonged  stimulus  evaluation  time). 
With  the  standard  Stroop  task,  reaction  time  scores  showed  the  usual  Interference  between  hue  and  word 
meaning  The  P300  latencies  elicited  by  such  words  however  remained  invariant. 

Summary.  The  use  of  the  Evoked  Cortical  Potential  (e.g.,  P300)  as  an  index  ot  workload  must  be 
recognized  as  a  highly  specialized  technique  that  requires  a  staff  of  highly  trained  personnel  familiar  with 
the  recording  techniques.  There  is  also  a  need  for  expensive  equipment  and  sophisticated  software  for 
the  recording  and  analysis  of  the  data  generated.  Beyond  these  considerations,  there  are  other  important 
technical  as  well  as  theoretical  issues  that  troy  limit  the  applicability  of  using  ECP  as  a  measure  of  OWL; 

*  The  ECP  technique  is  based  on  producing  an  ECP  in  response  to  some  time-locked 
repetitive  stimulus  event.  Such  eliciting  ECP  stimuli  are  usually  controlled  by  the 
experimenter  and  are  presented  as  secondary  task  stimuli  (e.g,,  oddball  paradigm).  It 
is  possible  in  some  system  applications  (e.g.,  field  testing  and  evaluation)  such  a 
stimulus  would  represent  a  form  of  intrusion  and  possible  distraction  to  the  operator. 

It  may  also  not  be  possible  to  implement  such  a  controlled  type  situation  for  some 
system  applications. 
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•  The  ECP  technique  requires  the  use  ot  electrodes  and,  in  some  cases,  associated 
restraints  are  needed  to  reduce  artifacts  (e.g.,  eye  movements  that  may  contaminate 
visual  evoked  cortical  potentials).  As  a  result,  the  applicability  of  the  ECP  technique 
may  be  limited  to  controlled  laboratory  situations.  To  illustrate  this  point,  Wilson  et  al. 
(1963)  conducted  a  study  with  12  A-10  tactical  air  command  pilots.  The  study 
involved  the  Implementation  of  various  simulator  emergencies  conditions,  whereby 
single  evoked  cortical  potentials  to  auditory  probe  stimuli  were  recorded 
simultaneously  with  the  occurrence  of  the  simulated  emergencies.  Only  three  pilots' 
ECP  data  could  be  used  out  of  12  pilots.  Artifacts  in  the  EEG  data  of  one  pilot 
resulted  in  his  rejection  and  the  other  pilots  were  rejected  due  tc  the  fact  that  their 
ECPs  failed  to  meet  the  ECP  criteria  for  discrimlnability  in  order  to  be  included  in  the 
data  analysis.  Such  results  point  out  the  fragi'e  nature  of  such  recordings 

•  ECP  results  may  not  show  a  strong  relationship  to  other  OWL  measures.  As  a  result, 
ECP  data  may  be  difficult  to  interpret  with  respect  to  their  significance  and  Implications 
toward  system  design  decisions.  For  example,  Bifemo  (1985)  reported  a  study 
whereby  subjects  performed  a  compensatory  tracking  primary  task  and  ECPs  were 
elicited  from  auditory  stimuli  that  were  the  call-signs  designated  (or  each  participant. 
In  addition,  each  subject  filled  out  the  NASA  Bipolar  scales  to  index  subjective 
workload.  The  results  were  such  that  4  out  of  20  subjects  exhibited  significant 
correlations  between  P300  amplitudes  elicited  by  their  auditory  call-sign  and  their 
weighted  workload  ratings.  With  only  four  significant  correlations  out  of  20.  the 
results  are  not  encouraging  with  respect  to  a  relationship  between  P300  amplitude 
and  subjective  workload  ratings. 

•  Studies  that  have  shown  a  relationship  between  ECP  components  (e.g.,  P300)  and 
operator  workload  have  been  limited  mostly  to  primary  tasks  that  can  be  characterized 
as  tracking  type  tasks.  It  therefore  remains  to  be  demonstrated  that  the  ECP 
technique  is  applicable  to  other  kinds  of  primary  tasks  that  are  now  required  of 
operators  because  of  the  advancement  of  technology  (e.g.,  decision  type  tasks,  data 
management  and  data  fusion  type  tasks,  and  communications  type  tasks.) 


Blood  Pressure 


Blood  pressure  reflects  both  cardiac  output  and  vasomotor  consequences  of  dilation  and  constriction 
of  the  blood  vessels.  (Tha  vasomotor  response  serves  two  functions;  to  maintain  body  tempeiafure  and 
to  direct  blood  flow  to  local  areas.)  The  more  blood  pumped  by  the  heart  and  the  more  the  resistance  the 
blood  encounters  in  the  vessels,  the  higher  the  blood  pressure.  Sympathetic  activity  tends  to  increase 
blood  pressure  by  increasing  heart  rate  and  causing  vasoconstriction. 

Summary.  Seve.al  studies  have  reported  blood  pressure  changes  with  workload.  Ettema  (1969) 
showed  relatively  small  effects  over  a  short  term  but  over  a  long  term  the  pressure  increased  substantially. 
Similar  results  were  reported  by  Ettema  and  Zielhaus  (1971)  who  used  auditory  reaction  time  for  the  task. 
Nevertheless,  the  measure  is  not  recommended  lor  workload.  One  major  delimiting  factor  is  that  this 
measurement  requires  the  operator  to  sit  still  to  get  quality  measurements.  Further,  blood  pressure  is  a 
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function  of  heart  rate.  One  could  eliminate  a  step  In  the  physiological  chain  and  measure  heart  rate 
directly 

Galvanic  Skin  Reeponae  (QSR):  8Mn  Conductance  -  3kln  Impedance 

The  galvanic  skin  response  (GSR)  Is  the  measure  of  the  resistance  of  the  skin  to  the  flow  of  electrical 
current.  The  resistance  of  the  skin  will  change  with  degree  of  production  of  the  sweat  glands  which  are 
innervated  by  the  sympathetic  system.  GSR  is  measured  by  applying  a  weak  current  through  the  skin  and 
measuring  the  resistance.  (Conductance  can  be  obtained  by  taking  the  reciprocal  o*  resistance.) 
Electrodes  are  usually  placed  on  tite  palm  or  on  the  wrist.  Skin  potential  is  a  related  measure  which  is 
often  used  in  mooetr.  research. 

There  is  a  large  psychology  literature  employing  the  technique,  however,  not  much  has  been  done  in 
the  workload  context.  O’Donnell  and  Eggsmeier  (1906)  do  not  even  review  the  technique  and  Wierwiile 
(1979)  only  discusses  a  few  reports.  For  example,  Kroese  and  Siddle  (1983)  studied  workload  while 
measuring  GSR.  They  varied  t:»  stimulus  presentation  rate  of  digits;  the  task  was  to  pick  out  odd  and 
even  sequences.  GSR  was  measured  on  irrelevant  tones  presented  during  the  task.  They  showed  skin 
conductance  to  decline  (habituate)  more  slowly  for  higher  workload  conditions  (faster  stimulus  rates).  The 
fact  that  the  GSR  habituates  with  repeated  presentation  of  stimuli,  makes  it  less  suitable  for  workload 
research  and  evaluation  than  many  other  techniques.  The  habituation  in  amplitude  of  the  response  may 
vary  with  workload  but  it  has  to  be  measured  over  a  series  of  presentations  and  then  a  new,  novel  stimulus 
must  be  presented  It  mig  ^.  be  useful  for  perceived  emergency  conditions. 

Summary.  GSR  has  been  shown  to  be  related  to  short-term  general  arousal  effects.  Sensitivity  is 
reasonable;  diagnosticity  is  low.  For  both  theoretical  and  practical  reasons.  It  is  not  recommended  as  a 
preterreo  technique  in  OWL  assessment. 

Electromyography  (EMG)  (Made  Potential) 

General  arousal  theory  would  claim  that  an  increase  in  mental  activity  would  be  accompanied  by  an 
increase  in  muscle  tension.  Electromyography  is  used  to  provide  a  measure  of  muscle  tension  and 
activky.  It  is.  however,  a  measure  of  somatic  rather  than  autonomic  nervous  system  activity  and  because  of 
this  it  is  a  rather  indirect  measure  of  workload 

Muscular  tension  is  related  to  both  physical  and  mental  activity  Indeed,  dealing  with  inappropriate 
muscle  tension  >s  one  of  the  more  common  approaches  to  athletic  psychology  (Nideffer,  1976).  In  tennis, 


(or  example ,  missing  the  Hirst  serve  may  cause  the  player  to  tense  up'  with  the  rest  A  that  the  muscles  are 
tighter  end  the  toss  on  the  second  servo  is  not  as  high.  The  ooneequenoe  often  is  that  the  second  serve 
is  aiso  missed.  Ctearty,  mental  activity  has  caused  a  change  in  muscle  stale. 

<h»  »iArtrica!  potential  created  by  motor  units  of  the  muscis  reflects  both  the  force  exerted  by  the 
muscle  and  the  tension  in  the  muscle.  This  can  be  measured  by  implanting  etectrodes  in  the  muscle  or. 
moire  feasibly,  by  measuring  the  surface  potential.  In  physical  work,  !  is  believed  there  ie  essentially  a 
linear  rsttf  ion  between  muscle  activity  and  the  recorded  potential.  This  permits  the  measuremrtf  of  both 
(a)  immediate  work  (forces  exerted)  and  (b)  tong-term  activity.  In  the  former  case,  the  absolute  forces 
required  to  move  or  operate  can  be  measured.  In  the  letter  esse,  temporal  analysis  of  speed  and  degree 
of  shift  win  show  different  spectral  charect ensues. 

Summary.  There  appears  to  have  been  little  reeearch  using  this  technique  in  the  last  ten  years. 
Wierwillo  (197C)  reviewed  a  few  studies  which  show  increased  tension  to  be  correlated  with  increased 
workload;  O'Donnell  and  Eggemeier  (1986)  reviewed  the  same  studies  and  came  to  a  similar  conclusion. 
Although  the  technique  reflects  workload  changes,  it  ie  a  technique  that  measures  the  somatic  system 
and  is  only  secondarily  tuned  into  mental  workload.  There  are  also  more  practical  ways  of  measuring 
physical  activity  such  as  video  taping  movements  and  analyzing  them  later. 

Crfttoal  FBckar  Frequency  (CFF) 

OFF  is  that  transition  frequency  at  which  a  flickering  Hght  passes  into  perceived  steady  state,  fusion.  A 
tremendous  amount  of  research  has  gone  into  this  phenomenon  over  the  test  century.  Brown  (1965)  has 
reviewed  much  of  the  work  on  intermittent  Mi  mutation  up  to  the  date  of  his  review.  (See  Watson  (19661  for 
a  the  rough  discussion  of  this  approach  as  wefl  as  a  current  review  of  temporal  sensftivRy.]  The  relative 
importance  of  the  phenomenon  for  this  report  is,  of  course,  the  application  of  the  technique  for  measuring 
workload. 

CFF  is  a  diffuse  but  direct  measure  of  CNS  functioning.  CFF  occurs  at  frequencies  between  50  and  70 
Hz  da  pending  on  contrast  and  ilumi  nation  (Brown,  1965;  Watson,  1966).  Because  cells  below  cortical 
level  have  been  shown  to  have  capabilities  to  respond  to  stimuli  at  much  higher  frequencies  than 
behavioral  CFF,  it  has  been  taken  to  be  an  index  of  physiological  functioning  of  the  cortex.  Further 
specification  is  provided  in  a  study  by  Witso.1  and  O’Donne!  (1966)  which  Isolates  several  aspect*  of  CFF 
with  respect  to  physiological  functioning.  The  procedures  used  are  too  complicated  and  specialized  for 
applied  work,  nevertheless,  the  results  are  related  to  the  diagnostidty  of  CFF.  They  used  steady  state 
ovok «d  potential  to  separate  out  three  frequency  ranges  of  tuckering  stimuli.  These  ranges  are  centered 
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at  approximately  10  Hz  (tow),  18  Hz,  and  50  Hz  (high),  cadi  with  dKiertny  an^KucSes  of  the  averaged 
signal,  JoOnwing  from  tt*o  work  of  Regan  (1ST?;.  Theca  results  Indicate  that  high  frequency  transmission 
is  related  to  sensory-motor  portions  of  the  scanning  task  while  the  medium  frequency  to  related  to 
cognitive  portions.  This  work  is  suggestive  that  CFF  changes  are  related  to  serao'y  functions  o»  the  CNS. 

Wierwiiie  (1979)  reviews  one  study  which  suggests  CFF  changer.  are  related  to  fatigue  but  not 
cognitive  workload  in  any  direct  or  consistent  manner.  Oshkna  (1981)  has  summarized  his  work  on  CFF  as 
a  measure  of  mental  fatigue.  Most  of  this  work  is  In  Japanese  and  therefore,  procedural  details  are  not 
readily  available  However,  he  suggests  CFF  is  an  effective  technique  to  measure  fatigue  He  ctoo  shows 
substantial  variation  as  a  function  of  diurnal  rhythm.  Brown  (1965)  also  reports  effects  of  diurnal  rhythm  in 
his  review.  Fatigue,  anoxia,  effects  of  drugs,  state  of  arousal,  and  age  are  among  other  factors  slrown  to 
influence  the  CFF  (Brown,  1965).  Wilton  and  O’Donnell  (1986)  have  shown  a  unique  and  a  high  degree 
of  stability  of  response  to  flicker  for  several  individuals  over  several  years. 

Summary.  The  CFF  technique  can  be  applied  in  a  short  period  of  time,  in  general,  psychophysical 
measurement  tends  to  be  quite  reliable  and  stable  when  extraneous  factors  are  controlled,  'towevor,  care 
must  be  taken  with  the  technique  to  evaluate  ail  of  the  factors  which  have  been  shown  to  influence  CFF. 
Changes  in  CFF  can  be  due  to  a  number  of  variables,  but  whan  these  are  factored  out,  it  appears  to  be  a 
broad  index  of  the  efficiency  of  CNS  functioning,  especially  the  sensory  component.  It  court  be  used 
effectively  to  evaluate  long  term  offsets  of  workload  during  sustained  operations  and  the  depletion  of 
resources. 


Body  FUd  Analysis 

Body  fluid  analysis  is  one  of  the  few  techniques  available  for  the  assessment  of  sustained  or  long-term 
effects  of  workload.  Three  body  fluids  ate  known  to  change  their  chemical  composition  as  a  function  of 
long-term  workload  and  stress:  Stood,  urine,  and  saliva.  Recent  worn  has  concentrated  on  urine  and 
saliva  because  these  two  can  be  obtained  relatively  rr  re  easily  than  blood  samples.  Periodic  urine 
collections  may  be  difficult  to  tocomplah  because  of  requirements  to  produce  on  demand.  Both  urine 
and  salivary  fluids  may  be  particularly  difficult  to  obtain  Just  after  intense  stress.  Indeed,  the 
psychoendocrine  approach  has  been  adopted  by  many  researchers  in  stress  research.  Sharit  and 
Saivendy  (1982)  provide  a  summary  of  both  theoretical  and  empirical  work  using  the  approach. 

The  compounds  typically  assayed  Involve  both  sympathetic  nervous  system  and  bodily  metabolic 
functions.  According  to  Wierwiiie  (1979),  the  concentrations  of  compounds  In  the  urine  or  parotid  fluid 
tirat  are  ex  an  joed  and  their  indications  are 
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•  norepinephrine  -  sympathetic  wnw*  system  activity, 

»  epinephrine  -  sympathetic  n«r/ou$  ovxtem  and  fidrenomadullary  activity, 

•  1 7-tiydwxycortkxfatarok.'  (v  7-OCM5)  -  adronocorllca'  sctlvUy , 

•  urea  •  protein  metatolism. 

•  sodium  •  mineral  metabolism, 

•  potassium  •  mineral  metabolism,  and 

•  sodium  to  potassium  ratio  -  metabolic  balance. 

The  usual  procedure  is  to  gather  samples  before,  during  and  after  a  prolonged  task.  The  samples  are 
then  analyzed  chemically  for  concentrations  of  compounds  suspected  to  be  related  to  high  workload. 
Timing  of  the  collection  ot  the  fluids  may  be  critical  when  measuring  the  symoathetica!ly  induced  changes. 
This  timing  issue  is  less  critical  lor  physical  activity  and  the  metabolic  measures.  The  technique  is  believed 
to  be  sensitive  to  prolonged  stress  and  strain.  It  is  also  likely  to  be  sensitive  to  physical  workload, 
particularly  for  compounds  associated  with  socJium,  potassium,  and  urea  The  technique  is  useful  for 
Assessing  possible  long-term  effects  but  is  not  recommended  for  strurt-term  effects  (Wieiwille,  1979). 

An  alternative  to  using  body  fluid  analysis  might  be  tc  use  a  subjective  method,  in  particular  a  mood 
scale.  Frequently  used  in  stress  research,  the  mood  scale  offers  a  reasonable  alternative  to  the  chemical 
assay  method,  reduces  the  resource  requirements,  and  can  be  administered  relatively  quickly. 

Overaa  Summary 

Physiological  techniques  assess  a  variety  of  physiological  subsystems  which  i  re  directly  or  indirectly 
influenced  by  workload  variations.  Settle  of  these  techniques  are  highly  specialized  to  examination 
particular  parts  of  tire  system  during  high  workload.  Becai  >e  ot  the  rapidity  of  nervous  system  activity  and 
the  counterbalancing  effects  of  antagonistic  systems,  Me  timing  of  measurer*  »ms  is  critical.  Every 
technique  reviewed  has  been  shown  to  be  sensitive  to  workload  and  almost  ev  ry  technique  has  been 
shown  to  have  fa'lures.  One  of  the  important  aspects  irsvtb  ?d  when  applying  any  physiological  technique 
is  the  recognition  that  various  subsystems  operate  in  opposition.  Accordingly,  data  analysis  plays  an 
important  role  in  the  success  or  failure  of  OWL  assessment  for  many  of  the  technique,1;. 

A  number  of  physiological  techniques  have  been  used  lr.  ftre  evaluation  ot  workload.  The  discussion 
can  be  summarized  into  four  broad  categories: 
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Mean  rate.  Heim  rate  has  been  grown  to  be  sensitive  to  workload  variations.  The 
technique  Is  controversial,  but  certainly  will  reflect  high  G'ress/workload 

Variability  (sinus  arrythmia).  Also  controversial  and  a  technique  which  requires  care  in 
data  analysis,  heart  rate  variability  has  been  shown  to  be  sensitive  to  woiklcad. 

»  Eye. 

Eye  movement  measurement  is  the  most  promising  physiological  technique  in  the 
applied  context.  Much  of  the  basis  for  usefulness  ot  the  technique  rests  on  the  high 
degree  of  reliance  on  visual  information  in  modem  systems.  Dwell  times  give  an 
Indication  of  importance  and/or  the  difficulty  of  interpreting  an  instrument  or  display. 
The  technique  does,  however,  require  considerable  resources.  As  indicated  by  a 
considerable  body  of  research,  eye  movement  techniques  are  certainly  sensitive  and 
have  a  capability  for  diagnostic  information  for  OWL  (Harris,  Gbvcr,  &  Spady,  1988). 

Pupil  dilation  has  been  shown  to  be  sensitive  to  workload  variations,  however, 
restrictions  required  to  obtain  clean  measurement  limit  the  field  application  of  the 
technique. 

Blink  rate  and  associated  meosures  such  as  latency  have  been  shown  to  be  sensitive 
to  workload  vanations. 

•  EEG/ECP. 

These  two  techniques  measure  electrical  activity  of  the  brain.  While  they  have  been 
used  quite  successfully  in  the  laboratory  to  assess  cognitive  states  and  their  relation 
to  OWL.  the  techniques  require  considerable  resources  and  can  be  difficult  to 
implement  in  field  situations. 

•  Other  Techniques. 

Blood  pressure.  This  measure  is  not  reoommendod  because  of  the  confounding  of 
cardiac  output  and  temperature  regulation. 

Galvanic  skin  response  (GSR).  This  has  been  shown  to  be  sensitive  to  mental  load, 
however,  the  effect  Is  one  of  slower  habituation.  This  tends  to  less  useful  as  a 
workload  technique. 

Electromyography  (EMG).  Increased  muscle  tension  may  b©  an  immediate 
consequence  of  increased  workload,  tol  not  necessarily.  Fcr  the  purpose  of 
measuring  longer  term  physical  work,  the  technique  would  be  useful 

Critical  flicker  ftequenc)' (OFF).  OFF  can  be  measured  easRy  and  reJlabfy.  ft  appears  to 
be  sensitive  to  ionger  lerm  effects,  especially  for  the  sensory  system. 

Body  fluid  analysis.  This  is  a  general  technique  which  can  be  used  to  detect  long 
term  effects  of  workload  and  stress. 
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CHAPTER  6.  MATCHING  MODEL 


The  purpose  of  the  matching  model  is  to  assist  the  user  in  selecting  OWL  measures  for  the  Army 
system  to  be  analyzed.  The  goat  is  to  use  all  of  the  Information  available  In  the  best  way  possible  to  match 
the  requirements  of  trio  user  with  characteristics  of  the  OWL  techniques.  The  analysis  of  interest  to  the 
user  may  be  for  an  Army  system  going  through  the  traditional  materiel  acquisition  process  (MAP),  or 
through  Army  Streamlined  Acquisition  Process  (ASAP),  Product  Improvement  Program  (PIP), 
Preplanned  Product  Improvement  (P3I),  or  Non- Develop  mental  Item  (NDI)  procurement.  One  reason  for 
the  Matching  Model  is  the  complicated  nature  of  the  OWL  measure  selection  process.  Another  important 
reason  for  the  matching  model  is  to  take  into  account  the  needs  and  requirements  of  the  user,  and  *he 
intended  application  of  the  results. 

It  has  been  suggested  that  the  Army  does  riot  have  sufficient  human  factors  personnel  available  to  deal 
with  any  but  the  most  pressing  operator  workload  issues.  This  was  partly  revealed  in  Army  interviews  (Hill, 
Lysaght  et  al.,  1987).  Further,  with  the  emergence  of  MANPRINT.  there  is  an  even  greater  need  and 
demand  for  human  factors  analysis  in  general  and  OWL  analysis  in  particular.  Clearly  there  is  need  for 
more  expertise  and  greater  distribution  of  OWL  information  within  the  Army  community.  The  question 
then  is  how  to  provide  such  expertise  within  existing  frameworks  and  organizational  structure.  While  there 
are  a  number  of  alternative  solutions  such  as  bringing  in  more  experts,  by  far  the  best  alternative  (and  leas, 
expensive)  is  to  use  a  computerized  Expert  System  approach.  An  Expert  System,  for  present  purposes, 
is  a  method  of  formalizing  the  considerations  Involved  in  selecting  OWL  measures  to  apply  to  analysis  ot 
Army  systems  in  various  stages  ot  development. 

When  one  calls  in  an  expert,  one  expects  to  get  answers  to  the  problem  at  hand.  No  answers  are 
possible,  however,  without  clearly  stated  questions.  Hence,  the  expert  will  often  begin  by  asking  a  host  of 
questions,  starting  with  very  general  issues  and  gradually  asking  about  more  and  more  detail,  finally 
coming  up  with  one  or  more  suggestions.  The  thought  processes  generally  follow  a  relatively  consistent 
line  whether  the  expert  be  Sherlock  Holmes  solving  a  mystery,  Einstein  developing  relativity  theory,  or  a 
practitioner  developing  a  line  of  analysis  for  measuring  OWL.  Although  not  always  formalized,  the  steps 
are:  first,  develop  a  system  model  which  organ!,  as  the  available  facts;  second,  determine  what  pieces  are 
missing  and  where  the  gaps  are:  develop  the  hypotheses;  and  third,  generate  specific  questions  to  be 
answered.  The  point  we  wish  to  make  is  there  is  nothing  so  practical  as  having  a  system  model.  This 
system  description  or  model  provides  an  organization  for  the  operator  behaviors  involved  and  a  framework 
which  is  extremely  helpful  in  posing  the  questions.  Such  a  mode!  can  often  be  obtained  from  analytical 
techniques;  analytical  techniques  often  have  an  important  secondary  function  since  they  provide  the 
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Initial  basis  oi  an  Army  system  model  ot  the  system  which  facilitates  the  generation  of  questions  and 
subsequert  answers.  In  the  next  section,  we  will  begin  to  formalize  the  steps  a  human  factors  workload 
expert  would  follow  in  selecting  an  appropriate  battery  of  techniques. 

There  are  a  variety  of  analytical  techniques  which  can  be  used  during  early  concept  phases  and  also 
later  in  development.  Not  all  of  these  analytical  procedures  have  boon  fully  validated.  However,  in  order 
to  be  validated,  they  have  to  be  used.  Accordingly,  we  will  suggest  techniques  that  appear  to  be 
appropriate,  independent  of  validation.  In  our  discussion,  we  will  describe  narratively  and  show  graphically 
the  reasoning  underlying  the  selection  of  techniques  from  the  analytical  category  of  the  OWL  technique 
taxonomy.  Then  we  will  consider  some  examples  and  case  studies  for  empirical  techniques.  Following 
that,  we  will  lay  out  the  considerations  for  an  overall,  general  matching  model  which  includes  both 
analytical  and  empirical  technkjues.  During  our  data  collection,  the  Army  community  expressed  a  desire 
for  a  computer-based  rather  than  a  written  manual  (Hill,  Lysaght  et  al.,  1987).  To  respond  to  this  desire  our 
expert  system  will  build  on  developments  Incorporated  in  W  C  FIELDE  (Workload  Consultant  for  FIELD 
Evaluation)  which  was  built  to  deal  with  empirical  techniques  in  an  aviation  context  (Casper,  Shively.  & 
Hart,  1987).  At  the  end  of  this  chapter,  ws  will  provide  some  background  on  computerized  expert 
systems 


Analytical  Matching  Model 

Analytical  workload  assessment  techniques  can  and  should  be  utilized  throughout  a  systems 
development  cycle,  but  are  especially  important  at  early,  pre-hardware  stages.  As  suggested  in  Chapter 
3,  there  are  few  good  predictive  techniques  and  many  of  the  analytical  techniques  have  limitations. 
Nevertheless,  the  tremendous  cost  /  benefit  value  of  recognizing  and  diagnosing  problems  early  on 
makes  the  use  of  these  techniques  imperative. 

This  section  describes  the  core  ot  the  analytical  methods  segment  of  the  overall  matching  model.  It  will 
assist  the  workload  analyst  to  make  intelligent  decisions  as  to  which  analytical  methods  to  use  for  a  specific 
situation.  First,  the  reasoning  underlying  the  model  is  explicated  In  narrative  form.  This  presentation  is 
high-level  and  is  intended  to  be  exemplary,  not  comprehensive.  Then  the  reasoning  Is  formalized  in  logic 
flow  graphical  descriptions.  We  have  chosen  to  begin  the  formalization  process  immediately  rather  than 
wait  until  after  validation;  in  this  way,  creation  of  the  matching  model  in  the  form  of  an  expert  system  is 
facilitated. 
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System  Considerations 


The  logic  underlying  this  first-cut  analytical  component  Is  explicated  in  the  following  system 
considerations.  Hopefully,  as  a  result  of  this  report  and  others  (Hill,  Plamondon,  Wlerwille,  lysaght,  Dick, 
&  Bittner,  1987)  analytical  techniques  will  receive  a  boost  toward  more  development  and  validation.  The 
main  considerations  for  analytical  procedures  are: 

•  What  is  the  stage  of  development  of  the  system? 

-  If  the  system  exists  only  on  paper,  then  analytical  techniques  are  the  techniques 
of  choice. 

-  Otherwise,  If  some  hardware  exists,  then  both  analytical  and  empirical  techniques 
are  possible.  Please  note,  however,  that  one  should  not  utilize  empirical 
techniques  without  a  very  clear  picture  ot  the  questions  to  be  answered. 

•  Has  a  mission  scenario  been  developed  for  the  system? 

-  If  the  answer  is  no,  then  one  must  be  developed.  It  is  absolutely  essential  lo  have 
a  definition  of  not  only  what  the  system  must  accomplish  hut  also  specification  ot 
the  accuracy  required  and  the  available  time  in  which  it  has  to  be  done. 
Additionally,  the  conditions  under  which  the  scenario  is  to  be  accomplished 
should  be  specified.  The  scenario  becomes  the  specific  framework  within  which 
OWL  can  be  assessed,  and  time  and  accuracy  become  the  measures  of 
effectiveness  (MOEs)  within  which  the  man/machine  performance  must  fall. 

•  If  the  answer  is  yes,  then  one  can  proceed. 

•  Has  any  workload  analysis  been  done  on  similar  systems? 

If  the  answer  is  no,  then  we  start  fresh  doing  an  overall  analysis,  probably  in  terms 
of  task  analysis  or  simulation. 

-  If  the  answer  is  yes,  then  one  should  build  on  the  analysis  which  is  already 
available.  Certainly,  one  would  want  to  compare  the  new  system  with  other 
existing  systems  via  Comparison  Analysis. 

•  Has  any  workload  analysis  been  done  on  this  system? 

If  the  answer  is  no,  then  we  can  skip  this  question. 

-  It  the  answer  Is  yes.  then  presumably  more  detailed  questions  should  be 
addressed.  It  may  then  be  appropriate  to  analyze  a  specific  portion  of  the  system 
in  detail  using  one  of  the  mathematical  model  techniques  or  operator  simulation. 


Real  world  constraints 


Having  identified  system  issues,  one  also  needs  to  consider  real  world  constraints  imposed  on  OWL 
analysis.  These  constraints  include  limits  on  the  timo  available  to  do  the  analysis  (How  fast  must  the 
analysis  be  done?  In  what  time  frame?),  the  manpower  available  to  do  the  analysis,  the  level  of  expertise  of 
the  staff  available  (which  will  have  an  impact  on  the  length  of  time  required  and  how  much  can  be  done)  as 
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well  as  the  level  of  detail  required  in  the  analysis.  Additional  constraints  may  exist  In  the  form  of  computer 
facilities  to  run  simulations,  both  on  the  hardware  side  and  the  software  side.  (However,  it  should  be 
pointed  out  that  both  Micro  SAiNT  and  HOS-JV  are  available  to  Army  users.)  Applying  thei.o  constraints 
may  lead  to  ruling  out  certain  types  of  analysis  techniques.  For  example,  if  only  two  weeks  are  available, 
then  one  might  only  use  expert  operator  opinion  to  identify  chokepolnts.  Otherwise,  a  more  detailed 
analysis  should  be  done. 

Decision  bfjlc 


The  (low  of  the  decision  logic  Is  illustrated  In  Figure  0-1  and  elucidated  in  the  following  outline.  This 
figure  does  not  contain  all  of  the  appropriate  detail  but  serves  to  show  the  principal  steps,  primarily  for 
systems  in  the  PreConcept  or  Concept  Exploration  Stages.  However,  analysis  of  workload  is  an  Iterative 
process  and  these  techniques  will  be  useful  at  any  point  in  the  analysis  process.  Feasibility  checks, 
shown  in  the  upper  right  of  the  figure,  are  also  repetitive;  the  proposed  analysis  must  be  compared 
against  real  world  constraints  at  various  steps  in  the  process.  Specifically,  the  feasibility  Issues  are 

•  Time  constraints,  how  much  time  is  available  to  do  analysis? 

•  Manpower  constraints  -  How  much  manpower  is  available  to  do  analysis? 

•  What  is  the  detail  required  in  the  analysis? 

•  What  is  the  required  accuracy  of  the  analysis? 

•  Facilities  -  are  computers  and  software  available  for  simulation? 

Step  i:  Has  any  OWL  analysis  been  done  on  this  system? 

Alternatives: 

If  no,  proceed  to  Step  II. 

If  yes,  proceed  to  Step  IV. 

Step  II:  Are  any  relevant  data  available?  Check  the  MANPRINT  ON-LINE  database  in  the  Soldier 
Support  Center  lor  possible  databases  which  may  contain  relevant  Information.  Also 
check  the  Manpower  and  Training  Research  Information  System  (MATRIS)  office  of  the 
Defense  Technical  Information  Center,  San  Diego,  tor  material  from  their  MANPRINT 
database.  The  Army  Research  Institute  (ARI)  and  the  Human  Engineering  Laboratory 
(HEL)  have  strong  human  factors  engineering  expertise  and  it  would  be  well  worth  while 
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contacting  on«  or  more  individuals  tn  these  organizations.  If  no  relevant  information  is 
found,  go  to  Stap  III;  otherwise  If  relevant  information  is  found  go  to  Step  IV. 


8y*r»m  In 
Concept  Pti**e 


Any  OWL 
Work  Don*? 


identify  Type 
Work  Don* 


Comparison 
System 


Comparison 


(Expert  Opinion) 


Expert  Opinion 


Select  Model 


Feasibility 

Chack 


Check 

Databases 


Expert  Opinion 


Report 


Report 


Report 


Figure  8-1 .  Diagram  of  OWL  analytical  Matching  Model. 

Step  III:  Has  the  mission  scenario  been  developed? 

Requirement:  A  mission  scenario,  if  not  available,  It  must  be  developed  before 
proceeding.  (A  feasibility  check  should  also  be  done  at  this  point.) 


Expert  opinion  Use  expert  opinion  from  one  or  more  individuals  to  identify  questions 
of  interest.  Experts  should  be  able  to  identify  possible  chokepciots  to 
focus  on  in  the  analysis. 
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AND 


Perform  task  analysis 
OR 

Simulation. 

THEN 

When  hardware  arrives,  other  techniques,  especially  empirical  ones,  can  be 
used. 

Step  IV:  is  the  previous  OWL  analysis  information  of  interest  for  a  comparable  system  or  on  the 
system? 

If  OWL  research  has  been  done  on  a  comparable  system 
THEN  DO 

Comparison  Analysis  between  the  older  system  and  the  current  system. 
OTHERWISE  select  one  of  the  following  specific  issues  for  the  current  system. 


Issue  1:  Re-evaluation  or  additional  work  needed,  that  is,  inadequate  information  is 
available. 


DO 

Expert  opinion  Use  expert  opinion  from  one  or  more  individuals  to  identify  questions 
of  interest.  Experts  should  bo  able  to  identify  possible  chokepoints  to 
focus  on  in  the  analysis. 


AND 


Task  Analysis 
OR 

Simulation. 

Issue  2:  Functional  re-allocation  of  rnan  and  machine  tasks. 
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DO 


Expert  opinion  Use  expert  opinion  fror.i  one  or  more  individuals  to  identify  questions 
of  interest.  Experts  should  be  able  to  identify  possible  chokepoints  to 
focus  on  in  the  analysis. 


AND 


Simulation 

Issue  3:  Specific  design  issues  (clarify  data  and  issues). 

DO 


Expert  opinion  Use  expert  opinion  from  one  or  more  individuals  to  identify  questions 
of  interest.  Experts  should  be  able  to  identify  possible  chokepoints  to 
focus  on  in  the  analysis. 

AND  one  or  more  of  the  following 

Math  models: 

Anthropometric  model 

Sensory  model 

Manual  Control  model 

Queuing  Theory  model 

Task  analysis: 

Cognitive  task  analysis 

Simulation: 


Detailed  network  models  or  HOS  may  really  be  the  only  simulation  models 
specific  enough  to  analyze  design  issues. 

Performance  model  -  Card,  Moran,  and  Newell  (1966). 

Empirical  techniques: 

Part-task  analysis  can  also  be  accomplished  with  empirical  techniques. 
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One  of  the  techniques  recommended  throughout  is  the  elicitation  of  operator  expert  opinion.  Often 
the  Individual  developing  the  OWL  analysis  does  not  have  direct,  first  hand  experience  on  the  operatbn 
of  the  system.  Use  of  operator  experts  can  both  save  time  and  provide  a  focus  on  operator  chokepoints. 
As  one  can  imagine,  the  definition  of  an  expert  varies  widely  and  one  needs  to  be  aware  of  the 
background  of  the  export.  For  example,  an  airline  pilot  is  certainly  an  expert  on  aviation,  but  would  not 
normally  have  substantial  background  on  advanced  avionics  or  advanced  display  technology.  A  test  pilot, 
by  contrast,  would  likely  have  a  much  richer  and  broader  experience  with  new  devices  and  technology  and 
would  be  able  to  identify  more  quickly  the  potential  trouble  spots.  This  does  not  mean  that  expert  opinion 
is  not  useful,  It  simply  means  it  should  be  put  in  the  conterri  of  the  experience  of  the  operator. 

Task  analysis  is  also  a  technique  utilized  very  generally.  Typically,  a  teak  analysis  is  done  in  concert 
with,  or  directly  following,  a  mission  scenario  development,  which  is  required  for  all  systems.  The  task 
analysis  forms  the  basis  for  performing  more  formal  analytical  techniques  such  as  mathematical  me  Jeling 
and  simulation,  and  serves  as  a  guideline  for  any  e  mpirical  work. 

Empirical  Cam  Studies 


Portions  of  the  empirical  matching  model  are  already  available  in  the  NASA  Ames  Expert  System  W  C 
FIELDE  (Casper  el  al.,  1987).  This  system  has  been  reviewed  by  experts  in  workload  research  and  has 
gone  through  several  revisions.  The  Matching  Model  outlined  in  this  chapter  is  anchored  on  the  st  ructure 
of  W  C  FIELDE.  However,  the  workload  approach  as  characterized  by  W  C  FIELDE  omits  some  issues  of 
major  interest  to  the  Army  community.  W  C  FIE  LDE,  in  particular,  does  not  have  the  capability  for  direct 
comparison  of  two  or  more  systems  nor  does  it  consider  individual  differences.  Of  potentially  more 
importance,  it  also  does  not  consider  conditions  under  which  the  system  must  operate,  such  as  battlefield 
conditions,  or  system  support  requirements.  Many  of  these  conditions  cannot  be  tested  except  in  an 
analytical  way  These  are  not  criticisms  of  V/  C  FIELDE.  Most  of  the  OWL  literature,  being  more 
academically  based,  frequently  addresses  issues  directed  to  a  theoretical  Interest,  instead  of  those  central 
to  the  goal  of  application.  These  academic  researchers  by  and  largo  have  not  only  ignored  some  issues, 
but  have  actively  sought  to  reduce  or  eliminate  them  as  contaminates  of  the  'rear  issues  they  wished  to 
study.  Although  theoretical  research  is  productive  and  important,  H  is  not  sufficient,  individual 
differences,  which  are  frequently  controlled  experimental  factors  in  research  laboratories,  ana  the 
comparison  ot  combat  systems  are  extremely  critical  factors  in  Army  systems  development. 

II  is  our  intention  to  develop  a  complete  and  integrated  matching  model  for  both  analytical  and  empirical 
techniques,  and  one  which  will  provide  OWL  measures  sensitive  to  Individual  differences  as  well  as  to 
comparing  several  systems.  In  this  section,  we  will  discuss  some  examples  of  system  design  issues  and 
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provide  recommendations  for  selecting  empirical  measures  and  appropriate  analytical  techniques  for 
situations  of  immediate  interest  to  the  Army.  The  issues  are  focused  at  different  system  evaluation 
problems.  For  example,  sometimes  a  workload  stud**  may  be  devised,  other  times  the  study  has  already 
been  done.  The  measures  suggested  are  the  minimum  one  shouid  collect.  The  cost  of  collecting  the 
data  and  the  analysis  requirements  have  been  taken  into  account  In  our  recoir.rmndations. 

System  Design  and  Development  Example  1 

Description  of  Example.  You  have  a  system  which  requires  that  the  operates'  routinely  performs  several 
tasks  or  sub-tasks  (e.g.,  tracking  targets,  radio  communications,  weapon  oelivery,  etc.)  in  order  to  carry  out 
a  mission.  You  are  interested  in  knowing  whether  an  operator  can  adequately  handle  the  system 
Specifically,  what  are  the  limits  in  the  operator's  performance  before  the  operator's  penormarico 
deteriorates,  that  is,  show  signs  of  overload?  The  following  steps  are  recommended  and  are  aiso 
illustrated  in  Figure  8-2.  (The  numbering  of  the  steps  matches  the  Roman  numerate  in  the  figure.) 

Step  I:  Identify  the  conditions  under  which  this  system  will  be  used.  Then,  identity  those  conditions 
which  can  be  tested.  A  feasibility  check  is  appropriate  at  this  point  in  the  process. 

Step  II:  Define  your  measures  for  the  primary  task,  including  the  overall  system  measures  (Type  1 )  and 
operator  response  measures  (Type  2)  as  discussed  in  Chapter  4.  These  performance  measures  may  yield 
important  information  on  overload,  system  instability,  as  well  as  permitting  inferences  on  performance  rule 
changes.  In  addition,  consider  the  use  of  SWAT  or  TLX  to  get  quantified  measures  about  the  operators 
opinions  about  workload  as  well  as  interviews  of  the  operator  to  get  additional  detail.  Finally,  the  heart  rate 
measure  can  be  useful  as  a  physiological  index  and  can  yield  some  additional  information  Depending  on 
the  context,  one  might  wish  to  consider  use  of  a  helmet  mounted  eye  tracking  camera  for  diagnosis 
Video  taping  the  operator  during  performance  of  system  tasks  is  highly  recommended  and  can  be  used 
for  delayed,  retrospective  TLX  (or  SWAT)  ratings. 

Step  ill:  Perform  the  stuoy.  But  before  commencing,  review  tl.e  feasibility.  Are  the  techniques 
feasible?  What  are  the  time  constraints?  How  much  time  is  available  to  do  analysis7  How  much  manpower 
is  available  to  do  analysis?  What  is  tin©  detail  required  in  analysis7  What  is  the  required  acy  of  the 
analysis?  Are  computers  aixj  software  facilities  available? 
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Figure  8-2.  Illustration  of  the  matching  logic  for  determining  the  selection  of  OWL  techniques  for 
assessing  overload  in  a  system 


Step  IV:  Check  the  primaiy  measures  for  performance  decrements  and  OWL  problems.  If  there  are 
indications  of  problems,  proceed  to  Step  V.  If  'here  iue  no  apparent  problems,  jump  to  Step  VI. 


Step  V:  Examine  ‘he  fine  structure  of  the  primary  task  to  look  for  performance  rules.  Examine  the  detail 
of  the  subjective  scales  (TLX  or  SWAT)  to  try  to  diagnose  and  identify  the  specific  issue  or  problem.  The 
heart  rate  data  can  be  anrlyzec'  to  give  more  detail  and  temporal  Iccus  ot  the  problem. 
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Step  Vi:  1$  itwre  we  no  appareix  OWL  >jrobieme  more  work  may  still  be  required  Thors  may  be  a  need 
to  look  at  the  condition.!  which  were  not  t.retod,  such  as  environmental  extremes.  This  could  be  done  by  a 
mature  ot  analytical  a  d  empirical  techniques.  The  analytical  portion  would  ir -elude  use  ot  expert  operator 
opinion  both  through  interviews  and  quantification  through  the  use  of  ProSWAT  (or  ProTLX).  (It  video 
tapes  were  made  o.i&nalfy,  the  video  tape  T.ay  be  useful  here  for  replay  to  the  operator  tor  retrospective 
ratings.)  Model  simulations  of  the  potential  chokepoirits  could  also  be  done,  tjch  as  Micro  SAINT  or  H OS 
to  test  extreme  conditions.  The  empirical  techniques  would  focus  rn  secondary  tasks  in  the  attempt  to 
drive  the  operator  to  higher  workload  levels.  The  secondary  task  results  coupled  with  the  primary 
measures  will  yield  important  data  about  strategies  and  identify  borderline  workload  portions. 

System  Design  enJ  Oeveto^iment  Example  2 

Description  ot  Example  You  have  two  alternative  cesigns  of  a  system  or  sub-system  which  have  been 
shown  by  previous  testing  to  be  essentially  the  same  (no  differences)  with  reaped  to  primary  task 
measures.  In  this  situation,  you  are  faced  with  what  appears  to  be  two  comparable  designs.  Which  design 
do  you  choose?  Since  the  testing  is  already  done,  this  can  result  in  come  serious  problems  as  will 
become  apparent  in  the  discussion. 

Step  I:  Identity  the  conditions  under  which  this  system  will  be  used.  Then  identify  tho^e  conditions 
which  have  been  tested.  A  teasibi'ity  crock  is  appropriate  at  tils  point  in  the  process. 

Step  II:  Determine  the  level  of  data  available.  The  data  one  would  want  are  those  described  in  Example 
1 ;  specifically,  complete  primary  task  data  and  the  subjective  scale  data.  Additional  data  are  always 
welcome,  especially  video  tape  of  the  operators.  If  the  primary  task  and  subjective  scale  data  are  available, 
go  to  Figure  8-2  and  follow  the  accompanying  description,  especially  from  Step  IV  on.  If  these  data  are  not 
available,  titere  are  si  few  things  one  can  do. 

•  Redo  the  OWL  analysis  as  described  in  Example  1 ,  Step  IV 

•  If  video  tapes  are  available,  then  one  can  ask  operators  to  use  SWAT  (or  TLX) 
retrospeciivelv  on  the  video  tapes. 

•  Use  analytical  techniques  as  describee*  in  the  Analytical  Matching  Model. 

•  il  oo  data  are  available  and  you  cannot  Jo  any  of  the  above,  our  advice  is:  Don't  ever 
get  Into  this  situation.  All  you  can  do  is  start  a:  Step  I  as  &  scribed  in  Example  1 . 


Bymom  Design  and  Devatopnwnt  Example  3 


Description  of  Exempt*.  You  have  a  system  that  Is  under  a  Pvoduct  Improvement  Program  (PIP)  or  P3t 
lot  enhancements  or  modifications!.  You  are  interested  In  whether  the  operator  can  handle  the  new 
capabilities  and/or  new  functionality  that  is  planned. 

Step  I:  Identify  the  conditions  under  which  this  system  will  be  used.  Then,  Identify  those  conditions 
which  have  been  rested.  The  system  may  be  manageable  under  test  conditions  but  may  not  be 
manageable  under  more  extreme,  e  g.,  combat,  conditions  Note  the  application  of  a  feasibility  check  at 
this  point  in  the  process 

Step  l!a:  Plan  a  task  analysis  to  determine  H  the  new  system  capabilities  involve  new  tasks  which  are 
added  on,  or  if  the  new  system  capabilities  will  help  the  operator  perform  his  duties,  or  both. 

Step  lib:  Plan  a  comparison  analysis  incorporate  expert  opinion. 

Step  III:  Since  this  is  an  existing  system,  a  workload  study  can  be  conducted  on  the  present  system 
with  a  secondary  task.  This  task  could  be  to  measure  the  operator's  spare  capacity  and  to  look  for 
performance  changes  and  especially  performance  rule  differences.  The  secondary  task  should  be 
selected  to  be  comparable  if  not  analogous  to  the  planned  modifications. 

Step  IV:  Do  a  feasibility  check  before  starting.  Are  all  tne  techniques  feasible?  How  much  time  is 
available  to  do  analysis?  How  much  inanpower  is  available  to  do  analysis?  What  is  the  detail  required  in 
analysis?  What  is  the  required  accuracy  of  the  analysis?  Are  computers  and  software  available  for 
simulation?  Are  computers  available  for  data  analysis? 

System  Design  and  Development  Exempt*  4 


Description  of  example.  You  have  a  system  under  test  and  evaluation.  Ycu  are  not  only  interested  in 
knowing  whether  the  system  can  be  handled  by  operators  within  the  context  of  a  mission  scenario  but 
also  where  the  high  workload  areas  are  that  could  lead  to  operator  workload  problems. 

Step  I:  identify  the  conditions  under  which  this  system  will  be  used.  Then,  identify  those  conditions 
wboh  have  been  previously  tested.  Again,  do  a  feasibility  check  at  this  point  In  the  process. 

Step  II:  Plot  the  m;s3ion  profile  An  example  nf  this  in  aviation  would  be  rake  off,  ascent,  cruise, 
descent,  approach,  an.,  landing.  Use  expert  opinion  to  determine  those  mission  segments  which  have 
highar  workload  han  others  It  you  wish,  use  ProSWAT  (or  ProTl.X)  to  quantify  the  expert  opinion. 
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Step  III:  Do  a  cognitive  task  analysis  to  Identify  the  goals  wd  strafegtos.  This  wiil  assist  In  (Electing 

primary  measures  and  secondary  tasks. 

Step  IV;  Then  perform  the  study  as  suggested  in  Example  1 . 

StenV.  Are  the  analytical  techniques  teas tole?  Are  the  empirical  techniques  feasible?  How  much  time 
is  available  to  do  analysis?  How  much  manpower  Is  available  to  do  OWL  analysis?  What  Is  the  detail 
required  in  analysis?  What  Is  the  required  accuracy  o t  the  analysis?  Are  computers  and  software  available 
tor  simulation?  For  data  analysis? 

System  Design  and  Devslopment  Example  5 

Description  of  Example.  How  would  you  deal  with  individual  differences,  personnel  considerations  and 
OWL  while  developing  and  Jesting  a  system? 

This  example  on  individual  differences  often  falls  In  the  cracks  between  operator  workload  and 
personnel  issues.  The  Army  has  a  wide  range  of  personnel  capabilities  and  any  OWL  analysis  should 
include  this  consideration.  The  operators  performing  test  evaluations  may  not  be  representative  of  the 
overall  population.  Whether  the  operators  are  representative  is  something  that  can  be  evaluated. 

1  mere  are  a  number  of  tools  being  created  for  the  concept/design  phase  of  system  development  which 
will  help  to  answer  personnel/OWL  problems.  In  particular,  the  MANPRIMT  Methods  Product  6  is  an 
analytical  tool  which  is  being  designed  to  address  the  questions  ot  what  kinds  of  personnel  characteristics 
are  necessary  to  operate  (and  maintain)  systems. 

At  present,  the  evaluator  can  do  some  straightforward  things  such  as  get  as  much  information  from  Ihe 
test  operator's  personnel  file  cs  possible  These  Items  would  include  toe  ASVAB,  any  MOS  information 
available,  and  .Skills  Qualifying  Test  (SQT).  It  is  not  suggested  here  that  any  one  or  all  of  these  items 
together  will  provide  detailed  predictions  of  performance  on  a  system.  They  will  permit  some  relative 
comparisons  o I  the  general  capabilities  of  the  operators  tested  with  the  pool  of  operators  for  which  it  was 
designed. 

The  Baste  of  a  General  Matching  Model 

The  objective  In  developing  a  general  matching  model  is  to  provide  a  basis  for  the  systematic  selection 
of  a  good,  if  not  optimal,  set  of  workload  assessment  techniques  for  a  given  circumstance.  As  this  is  an 
ambitious  undertaking,  we  have  begun  what  is  clearly  an  evolutionary  process.  Such  a  beginning  will 


starve  to  stimulate  rapid  growth  in  the  area  of  workload  assessment.  The  model  ottered  here  builds  and 
expands  upon  the  concepts  contained  In  W  C  FIELDE, 

The  particular  use  of  the  matching  model  will  depend  on  the  situation  of  the  user.  The  user  may  be  the 
OWL  assessor,  or  the  user  may  designate  that  rc.e  to  a  designer,  an  engineer,  etc.  Many  variables 
properly  sheet  the  selection  o!  an  assessment  technique  battery  so  all  appropriate  personnel  involved  in 
the  development  and  evaluation  should  collaborate  in  this  decision.  The  model  ottered  is  designed  to 
enhance  this  collaboration. 

Specific  Goals  and  effectives 

The  specific  goals  o',  this  effort  were  to  develop  the  framework  for  a  user-centered  expert  system  / 
decision  aid  techniaue  using: 

-  The  Matching  Model  to  guide  the  user  to  the  appropriate  workload  methodology; 

*  The  OWL  information  System  to  guide  the  user  to  the  appropriate  background 
literature; 

*  Other  Databases  to  guide  the  user  to  the  appropriate  and  available  comparison 
systems;  and 

*  Other  tools  as  may  be  available  or  are  In  development  such  as  MANPRINT  Methods 
being  developed  by  ARI. 

The  interrelation  ot  the  component  parts  of  this  approach  is  illustrated  in  Figure  8-3.  In  developing  this 
overall  model,  our  guiding  principle  was  and  Is  that  the  user  need  not  be  an  expert  in  human  performance 
technology,  statistics  and  data  analysis,  laboratory  work,  and  using  computers.  It  was  assumed  that  the 
user  will  be  responsible  for  daoidirig  on  workload  analysis  techniques  and  will  be  responsible  for  getting 
the  analysis  done.  Further  more,  it  is  anticipated  that  the  user  may  not  be  totally  familiar  with  the  Army 
system  acquisition  processes.  The  emphasis  is  net  to  make  a  user  into  a  human  factors  engineer  but  the 
goal  is  to  make  the  user  more  knowledgeable  about  what  needs  to  be  done  and  what  are  the  available  and 
preferable  options.  Whereas  the  Matching  Model  may  assist  the  user  in  selecting  the  appropriate 
techniques,  the  application  of  a  workload  technique  may  require  the  assistance  oi  a  human  factors 
engineer.  Consequently,  there  is  an  attempt  to  identify  and  locate  expertise  where  ever  possible. 
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Figure  8-3.  Illustration  ot  the  vaitous  components  feeding  into  the  Matching  Model. 

Matching  Model  Development 

We  began  to  construct  our  matching  model  by  developing  a  list  of  relevant  user  questions,  in  a  format 
appropriate  for  an  expert  system.  The  terminology  follows  that  of  an  expert  system  shell  and  that  of  W  C 
FIELDE.  Each  entry  in  the  list  consists  of  a: 

•  Question  -  to  be  answered  by  the  user, 

•  Reason  -  basis  for  a  decision  rule  (or  set  of  rules)  based  on  the  answer  to  the 
question,  and 

•  Alternative  User  Responses  -  possible  user  answers  to  the  question. 


Next  our  set  of  questions  was  compared  with  the  issues  covered  in  W  C  FIELDE.  Differences  were 
analyzed  and  appropriate  revisions  were  made  In  our  list.  Tire  result  of  this  comparison  process  is  the  list 
of  23  questions  presented  bskw  which  form  the  basis  of  expert  system  development.  Table  8-1  contains 
the  set  of  operator  workload  techniques  to  oe  included  in  the  Matching  Model. 


Question  1 :  Wnat  is  the  type  of  acquisition  process  the  system  is  going  through? 

Reason:  The  selection  of  OWL  techniques  wifi  depend  on  the  type  of  acquisition  process 
being  used.  A  Nori-Developmental  Item  may  not  have  the  concept  phases  and 
therefore  both  analytical  and  empirical  techniques  can  be  used  from  the  beginning. 
Further,  the  time  available  for  OWL  analysis  will  vary. 

Alternatives: 

Traditional  Materiel  Acquisition  Process  (MAP) 

ASAP 

NDi 

P3I 

PIP 

Question  2:  If  traditional  MAP,  what  stage  in  the  system  acquisition  cycle  is  the  man-machine  system 
currently  in? 

Reason:  The  first  two  alternatives  are  predominantly  evaluated  by  analytical  techniques. 

Usually,  there  is  no  hardware  such  as  a  simulator  available  to  do  any  detailed  testing 
with  an  operator  in  the  loop.  There  are  exceptions  to  this;  for  example,  availability  of 
generic  simulators  and  rapid  prototyping  systems  in  which  new  displays  can  be 
installed  and  evaluated.  The  answer  to  this  question  has  an  impact  on  deciding  which 
category  of  technique  is  more  appropriate.  While  any  technique  can  be  used  at  any 
stage  of  development,  typically,  fewer  possibilities  exist  during  early  stages  of 
development.  Flexibility  in  selection  of  various  evaluative  techniques  increases  as  we 
go  down  the  list.  In  some  sense,  the  ease  and  cost  of  evaluation  is  also  influenced, 
e.g.,  if  only  one  simulator  exists,  scheduling  time  to  perform  tests  will  be  more  difficult 
than  Hf  several  simulators  exist.  The  capability  of  changing  the  workload  through 
system  design  decreases  as  we  go  down  the  MAP  list  for  cost  reasons.  Either  of  the 
first  two  alternatives  will  result  in  a  suggestion  of  analytical  techniques.  Falling  into  the 
latter  three  categories  does  not  eliminate  the  possibility  of  using  analytical  techniques. 
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Table  B-1 .  Complete  list  of  techniques  and  measurement  procedures  tor  OWL 


Aitematlves 

Analytical  Procedures 

Comparison  Analysis 

Early  Comparability  Analysis 

Expert  operator  opinion 

Prospective  rating  scales  -  ProSWAT 
Other 

Mathematical  models 

Manual  control  models 
Information  theory  model 
Queueing  theory 
Other 

Task  analysis  methods 
Task  Analysis  * 

HRTES 

McCracken-Aldrich 
'Cognitive  task  analysis 

Simulation  models 
Time  line 

Performance  model  (Card/Mf  ran/Newell) 
Micro  Saint  •  network  simulat  >ns 
SWAS 
SIMWAM 

Human  Operator  Simulator  (H  )S) 

Empirical  Procedures 

Primary  task 

System  Response 
RMS  Error 
Performance  related 

Primary  task  speed  * 

Primary  task  accurac '  * 

Fine  structure 
Other 

Subjective  scales 

Rating  scales 

Analytic  Hierarchy  Process 
Cooper  Harper  * 

Honeywell  version  ot  Cooper-Harper 
Modified  Cooper  Harper  * 

Bedford  * 

SWAT  * 


*  The  measurement  technique  is  included  in  W  C  FIELDE. 


Table  8-1 .  Complete  list  of  techniques  and  measurement  procedures  for  OWL  (continued). 


Empirical  Procedures  (cont.) 

NASA  TLX  (NASA  Bipolar)  * 

WCI/TE 

Psychometric  methods 

Magnitude  estimation 
Equal  interval 
Paired  Comparisons 
Specialized  scales 

Pilot  Subjective  Evaluation 
Dynamic  Workload  Scale 
Questlonnalres/Survey 
Interviews 
Other 

Secondary  task 

Embedded  secondary  tasks  * 

Dual  tasks 

Sternberg  Memory  * 

Mental  math  * 

Shadowing  * 

Time  estimation  * 

Communications  * 

Tracking  * 

Monitoring  * 

Choice  RT  * 

Embedded  secondary  tasks  * 

Other 

Physiological  &  eye  movements 

Heart  rate  * 

HR  variability  (0.1  Hz)  * 

Body  tluid 
CFF 

Eye  measurements 

Eye  point  of  regard  -  Eye  movements 
Eye  blinks  * 

Pupi  diameter  * 

EEQ  (brain  activity) 

Evoked  potential  * 

Blood  pressure 
GSR  (skin) 

EMG  (muscle) 

Other  techniques 
Video  tape 


The  measurement  technique  is  included  in  W  C  FIELDE. 
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However,  the  emphasis  may  shift  to  empirical  techniques  and  the  data 
requirements  are  much  more  rigid  due  to  the  magnified  cost  of  design 
changes.  Here  we  need  to  focus  on  precise  detailed  problems.  Part-task 
studies  are  quite  useful  to  decide  whether  to  make  some  hardware  changes 
or  nosstoly  add  decision  aids. 

Alternatives 

Pre-concept  exploration 
Concept  exploration 
Demonstration  &  validation 
Full  scale  development 
Production  &  deployment 

Question  3:  What  is  the  time  frame  in  which  workload  analysis  must  be  complete? 

Reason:  Determine  the  impact  of  the  analysts  time  frame  on  techniques  selected,  e  g  . 

if  time  is  short,  then  use  subjective  techniques  for  both  analytical  and 
empirical  purposes. 

Alternatives: 

Less  than  a  month 
One  to  2  months 
Within  6  months 
Within  1  year 
More  than  a  year 

Question  4:  What  sort  of  system  apparatus  exists  lo  assess  workload  during  performance  o< 
primary  tasks? 

Reason:  if  no  hardware  exists,  then  we  must  rely  on  OWL  analytical  techniques,  i.e., 

task  analysis,  simulation  models,  etc. 

Alternatives: 

Simulators 

specific  to  current  system 
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generic 
Prototypes 
Mock-ups 
Production  system 

Question  5:  What  computer  software  facilities  are  available  ? 

Reason:  If  no  software  exists,  then  we  niust  go  to  other  techniques  such  as  pencil  and 
paper  techniques,  i.e.,  task  analysis,  but  cannot  use  simulation  models,  etc. 

Alternatives: 

Computer  simulation  models 
time  Ine  analysis 
Micro-Saint 
HOS 
other 

Data  collection  (interface  software) 

Data  analysis 

Statistical  analysis  packages 
Question  6:  What  computers  are  available? 

Reason:  It  requires  a  computer  to  run  simulations.  Different  simulations  run  on  specific 

machines  and  may  not  be  compatible  with  other  machines. 

Alternatives: 

Micro-computer  (IBM-PC/AT)  or  compatible 
VAX 

Main  frame 
Other 

Question  7:  What  sort  of  laboratory  facilities  are  available  for  empirical  work? 

Reason:  Some  empirical  techniques  require  specialized  facilities  or  equipment. 

Primary  and  secondary  techniques  may  require  equipment  to  present  tasks 
and  record  responses.  Subjective  techniques  may  use  computers  or 
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paper  and  pencil.  Physiological  techniques  may  require  equipment,  such 
as  sensors,  to  record  physical  responses. 

Alternatives:  Video,  Audio,  EEG,  EKG,  Pupil  diameter  measurement  equipment,  and 
Oculometar,  etc. 

Question  8:  What  statf  support  Is  available  either  in  house  or  through  another  organization? 

Reason:  It  is  necessary  to  have  the  expertise  (or  the  expert)  available  on  the  various 

topics. 

Alternatives: 

Expert  operators  on  this  or  similar  systems 
Technicians,  electronic,  computer 
Human  Factors  specialists 
Personnel  for  testing  in  laboratory  or  field 
Software  developers  -  programmers 
Statistical  analysis  support 

Psychometric  scaling  and  /or  questionnaire  expertise 

Question  9:  How  much  staff  or  manpower  is  available  to  do  the  OWL  analysis? 

Reason:  Certain  techniques  (especially  empirical)  are  very  labor  intensive.  Certain 

techniques  are  more  flexible  than  others  in  terms  of  manpower  requirements. 

Alternatives: 

Less  than  1  worn  week 
Less  than  1  work  month 
One  to  six  work  months 
Six  months  to  1  work  year 
More  than  l  wor*<  year 

Question  10:  Why  is  OWL  assessment  being  dons? 

Reason:  The  reason  OWL  assessment  is  being  done  will  imluence  the  types  of  techniques 
used. 
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Alternatives : 


MANPRiNT  requirement 

Comparability  analysis  suggests  chokepoint 

Chokepoint  already  identified 

Comparison  of  two  (or  more)  candidate  systems 

Examination  of  individual  differences 

Question  11:  What  is  the  Mission  Area  (13  areas)? 

Reason:  Answers  to  this  question  will  be  helpful  in  directing  the  user  to  appropriate 
information  already  existing  on  workload  evaluation.  This  breakdown  will  be 
heipfui  in  tracing  down  comparable  systems  and  may  or  may  not  be  useful  in 
the  matching  model.  For  instance,  aviation  systems  have  had  considerable 
evaluation  in  the  commercial  arena  by  NASA,  FAA,  and  by  commercial  aircraft 
companies.  Other  areas  may  or  may  not  have  a  similar  counterpart.  This  is 
also  an  attempt  to  use  all  information  which  may  be  available  in  other 
databases. 

Alternatives: 

Close  Combat  (heavy) 

Close  Combat  (light) 

Aviation 
Air  Defense 

Combat  Support,  Engineering,  &  Mine  Warfare 

0>«nhat  Service  Support 

Fire  Support  and  Target  Acquisition 

Nudear,  Biological,  Chemical 

Command  8  Control 

Communications 

Intelligence  &  Electronic 

Special  Operations 

Combined  Arms 
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Question  1 2:  Is  this  a  derivative  system  or  a  brand  new  one? 


Reason*  It  it  is  a  derivative  system  then  the  system  can  probably  be  tested  in  a  generic 
simulator  using  the  old  system  simulation  model  with  mock-ups  of  the  new 
operator  controls  and  procedures. 

Alternatives: 

New 

Derivative 
Don!  know 

Question  13:  What  arc  the  criteria  against  whicn  to  judge  OWL  with  respect  to  overall  man- 
machine  systam  performance? 

Reason:  Need  to  know  how  tha  criteria  were  developed  and  to  what  they  refer  (this 
defines  the  boundaries  of  the  criteria).  Differentiate  between  system 
performance  which  includes  the  man  and  machine  vs.  human  performance 
abne.  A  standard  is  needed  to  determine  satisfactory  syrtem  performance. 

Alternatives: 

Time  requirements  for  mission  objectives 
Accuracy  /  Error  requirements  for  mission  objectives 
Both  time  and  accuracy. 

Not  identified 

Question  14:  What  operating  conditions  (e.g.,  environmental  conditions)  and/or  system  usage 
factors  need  to  be  addressed  or  simulated  by  OWL  assessment? 

Reason:  There  are  likely  to  be  conditions  under  which  the  system  cannot  be  tested 

even  though  they  are  conditions  within  which  the  operator  woulo  be  under 
extreme  stress,  for  example,  battlefield  conditions  These  conditions  will 
need  to  be  addressed  with  analytical  techniques  even  if  the  system  exists 

(Part  of  the  answer  could  be  to  highlight  or  alert  the  user  io  the  existence  of 
voids  in  the  availability  of  techniques.) 

Alternatives: 

Deep  Battle  Environment 
Covering  Force  Operations 
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Main  Battle  Area  Environment 
Rear  Areas 
Sutfxirt  Activities 
Tralnlrtg 

NSC  Environment 

Climatic  Conditions  *  heat  cold,  etc. 

Noise 

Vibration 

Question  15:  Are  indMdus!  operator  diflerences  inportant?  That  is,  does  the  OWL  analysis  need 
to  take  into  account  the  caliber  and  number  erf  Individuals  available? 

Reason:  This  question  has  to  do  with  empirical  evaluations.  Test  systems  are  often 
evaluated  using  top  caltoer  operators.  Even  if  individuals  are  successful,  less 
capable  operators  may  not  be.  This  suggests  getting  ASVAB  and  outer  data 
available  on  the  operators  and  comparing  these  test  scores  to  the  general 
tevei  within  the  MOS. 

Alternatives: 

Yes 

No 

Question  16:  What  are  the  primary  measures  of  iruman  portormanc?  in  the  system? 

Reoeon:  This  is  an  attempt  to  help  the  user  define  successful  performance. 

Alternatives: 

Time  requirements 
Accuracy  (or  error)  requirements 
Both  time  and  accuracy 
Fine  structure  of  behavior 
Not  Identified 

Question  1?:  What  are  the  quaMIc^tons/characterUttcs  expected  for  operators  of  the  system? 
Do  you  need  to  consider  manpower  and  personnel,  and  training  issues? 

Reason:  This  question  has  to  cto  with  MPT  objectives.  This  is  a  step  toward  defining 
individual  differences  If  the  analysts  includes  marr-tn-tne  iocv  then  w«,  wuukj 
recommend  getting  ASVAB,  MGS  test  data  Skills  Dual, Tying  Test  (SOT),  and 
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other  available  information.  9y  knowing  that,  we  would  be  able  to  inter  what 
the  dominant  characteristics  must  be.  (This  question  has  to  be  stated  in 
appropriate  terminology,  otherwise,  the  user  may  not  know  how  to  answer 
this  very  well ) 

Akwmtfvas: 

Manpower  requirements  (a.g.,  crew  size) 

Personnel  requirements  -  Aptitudes  (e.g.,  coding  speed) 

Training  -  Stents  and  knowledge  of  sokltor  (e  g.,  time/accuracy  requirements 
tor  performance  of  system  tasks,  knowledge  ot  other  systems  Interfacing  with 
system  to  be  developed) 

Question  18:  Has  arty  OWL  analytical  analysis  been  done? 

Reason:  Analysis  of  workload  is  an  iterative  process,  throughout  the  acquisition 
development  cycle,  tt  is  Important  to  determine  whether  system  performance 
requirements  were  fulfilled  and  to  identify  the  workload  techniques  used. 

Alternatives:  Analytical  Procedures 

Table  8*1  provides  a  '1st  of  theca  alternatives. 

Question  13  :  Has  any  OWL  empirical  analysis  been  done? 

Reason:  When  empirical  analysis  is  possible  (later  in  the  development  cycle)  the 

information  gained  is  very  valuable  to  users  and  future  OWL  assessment. 
Empirical  analyses,  In  general,  have  more  face  validity  than  analytical 
techniques  because  they  are  more  grounded  in  reality. 

Alternatives:  Empirical  Procedures 

Table  8*1  provides  a  list  ot  these  alternatives. 

Question  20:  What  operator  performance  characteristics  are  relevant  to  the  particular  mart-machine 
system?  (Universe!  operator  behavior  dimensions  [Berliner,  Angetl,  &  Shearer,  1964]) 

Reason:  We  are  interested  in  the  categories  of  behaviors  the  operator  must  us®.  These 
questions  can  relate  to  the  operators  performance  and  to  the  ability  of  the  system  to 
.Tiasf  p3flGunai«cc  characteristics,  e.g.,  servicing  targets,  hying  specific  numbers  or 
missions  per  day.  (The  user  may  not  know  how  to  answer  this  very  well  in  the  form 
given  for  the  alternatives.  It  may  be  helpful  fc  define  the  type  of  equipment  and  then 
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to  Inter  what  the  behavioral  categoms  will  bo.  If  alt  of  those  Eiferoatives  are  selected  it 
will  be  necessary  to  tweak  them  out  into  subsots  to  deal  with  them  more  eflicientiy.) 

Alternatives: 

Perceptual 
Mediatorial 
Communication 
Motor  processes 

(A  aorrifjiouf  J':i  is  provided  in  Table  6  2.) 

Question  21 :  Can  the  operator  be  interrupted  during  a  mission  or  are  there  blocks  of  time  during 
the  mission  in  which  the  operator  can  fill  out  forms? 

Reason:  Subjective  measures  require  some  time  tor  filling  out  the  rating  forms,  if  the 

operator  cannot  be  interrupted,  then  it  te  better  to  video  tape  She  session  and 
have  the  ratings  completed  Safer. 

Alternatives: 

Yes 

No 

It  is  possible  to  use  video  tape  and  get  ratings  later 
Question  22:  Does  the  operator  Ivtv*  spare  time  to  do  other  Wrings  at  various  points  in  the  mission? 

Reason:  Secondary  tasks  may  be  used  9  there  Is  some  spare  time. 


Altemrttvas: 

Yes 

No 

Question  23:  What  is  the  required  duration  of  operator  performance? 

Reason:  Again  this  is  an  important  determinant  of  the  types  of  data  which  can  be 
collected.  Short  term  and  tong  term  performance  are  different  situations  and 
require  different  treatment. 
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Table  8-2.  Listing  of  Berliner  et  al.  (1C  54)  taxonomy  of  oocjnltlv*  behaviors. 


Perceptual  processes 


Searching  for  snd 
receiving  Information 


Identifying  objects, 
actions,  events 


c 


Mediational  processes 


Information 

processing 


Problem  solving  and 
decision  making 


Communication  processes 


S'Mple/Discrsts 


Motor  processes 


Complex/Co  ntinuous 


Detects 

Inspects 

Observes 

Resds 

Receives 

Scans 

Surveys 

Discriminates 

Identifies 

Locates 

Categorizes 

Calculates 

Codes 

Computes 

Interpolates 

Itemizes 

Tabulates 

Translates 


Analyzes 

Calculates 

Chooses 

Compares 

Computes 

Estimates 

Plans 

Advises 

Answers 

Communicates 

Directs 

Indfcctss 


Instructs 

Requests 

Transmits 

Activates 

Closes 

Connects 

Disconnects 

Join* 

Moves 

Presses 

Sets 


r~  Adjusts 

Aligns 


Regulates 

Synchronize* 

Trades 
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Alternatives: 


Loss  than  one  minute 
Less  than  an  hour 
One  to  two  hour?. 

Two  to  8  hours 

Sustained  performance  (over  8  hours) 


Expert  System  Output 

Possible  Recommendations:  The  outcome  and  recommendations  are  selected  from  a  comprehensive 

hierarchy  of  OWL  techniques  listed  Table  8-1 .  Those  which  are  addressed 
in  W  C  FIELDE  are  noted  with  an  asterisk. 

Outcome  Alternatives:  The  cutcome  possibilities  are  the  entire  set  of  techniques  shown  in  Table  8-1 . 


Expert  Systems 

Two  issues  are  considered  in  this  section:  What  is  an  expert  system  and  What  are  the  reasons  for  an 
expert  system? 

What  la  an  Expert  System? 

An  expert  system  codities  the  specialized  problem  solving  expertise  of  an  authority  or,  in  some  cases, 
many  authorities  to  assist  in  solving  complex  problems  in  narrow  domains.  Expertise  in  a  specific  domain 
may  generally  be  described  as  knowledge  about  the  domain,  the  problems  involving  the  domain,  and  the 
methods  and  approaches  to  solving  the  problems.  The  terms  expert  system  and  knowledge-bast/d 
system  are  often  used  interchangeably  to  refer  to  artificial  intelligence  based  systems  that  capture 
expertise  in  problem  areas,  in  our  approach,  an  expert  system  is  considered  to  be  a  system  consisting  of 
two  separate  components. 

•  A  knowledge-base  representing  the  heuristics,  facts,  judgments,  and  experience 
about  a  selected  problem  domain. 

*  An  inference  processor  which  interprets  the  contents  of  the  knowledge-base  to  inter 
conclusions  toward  a  solution  of  the  prob'em 
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The  separation  of  the  knowledge  from  the  inferential  mechanism  permits  more  flexible  development  and 
application,  and  more  closely  folk-ws  how  humans  deal  with  complex  problem  domains.  Traditions!*'/, 
expert  systems  are  generated  by  a  knowledge  engineer  who  questions  extensively  an  expert  in  a  field  to 
determine  information  and  know-ftow  about  a  selected  topic,  and  translates  the  expert’s  knowledge  into  a 
knowledge-base.  This  knowledge-base  construction  is  both  the  heart  of  and  the  main  bottleneck  to 
building  an  expert  system. 

71m  Reasons  tor  an  Expert  System 

There  ~re  a  number  of  reasons  for  developing  an  expert  system.  Many  of  these  reasons  are  listed  in 
Table  8-3.  While  all  of  these  reasons  are  relevant,  the  more  important  ones  are:  (a)  communication  of 
knowledge  easily  and  efficiently,  and  (b)  consistency  and  reliability. 


Table  8-3.  When  expert  systems  pay  for  themselves  (Van  Horn,  1986). 


•  The  expert  is  not  always  available,  the  expert  is  retiring,  the  expert  is  very  expensive 
or  rare 


•  A  shortage  of  experts  is  holding  back  development  and  implementation 

•  Expertise  is  needed  to  augment  the  knowledge  of  junior  personnel 

•  There  are  too  many  factors  or  possible  solutions  for  a  human  to  keep  in  mind  at  once, 
even  when  the  problem  is  broken  Into  smaller  units 

<  Decisions  must  be  made  under  pressure,  and  missing  even  a  single  (actor  could  be 
disastrous 

•  A  huge  amount  of  data  must  be  sifted  through 

•  Factors  are  constantly  changing,  and  It  is  hard  for  a  person  to  keep  on  top  of  them  all 
and  find  what  is  needed  at  just  the  right  time. 

•  One  type  of  expertise  must  be  made  available  to  people  in  a  different  field  so  they  can 
make  better  decisions 

•  There  is  rapid  turnover,  a  constant  need  to  train  new  people.  Training  Is  costly  and 
time  consuming 

•  Th?  problem  requires  u  knowledge-based  approach  and  cannot  be  handled  by  a 
conventional  computational  approach 

•  Consistency  and  reliability,  not  creativity,  are  paramount 

□zizzzizziiz  iiiiiizziizzzzzizzz:  : 
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Summary  end  Conctustona 


This  intent  01  this  chanter  is  rwo-lold  within  the  focus  of  OWL  technique  selection.  First,  the  discussion 
lays  out  some  examples  for  the  immediate  application  of  OWL  techniques  in  the  prediction  and  evaluation 
of  workload.  Second,  this  chapter  outlines  the  general  approach  that  naeds  to  be  taken  for  selecting 
OWL  techniques.  Twenty-three  questions  are  presented  which  cover  all  major  aspects  of  workload 
technique  selection. 

The  general  approach  illustrates  the  seemingly  complex  set  of  considerations  which  roust  be 
addressed  in  selecting  techniques.  This  general  approach  can  best  bo  implemented  In  a  computerized 
expert  system.  Through  this  means,  the  development  community  has  access  to  broad  body  of  workload 
knowledge  which  is  distributed  and  accessed  in  a  systematic  and  efficient  manner.  Both  the  system 
developer  and  the  workload  analyst  can  identify  easily  the  appropriate  means  to  assess  workload. 

f 
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CHAPTER  9.  CONCLUDING  COMMENTS  AND  FURTHER  DISCUSSION 


The  overall  ouroose  ol  this  report  Is  to  provide  useful  and  practical  information  concerning  operator 
workload  (OWL).  This  information  is  used  not  only  for  the  evaluation  of  existing  Army  systems  but  also  for 
prediction  of  workload  for  future  systems.  Much  of  the  material  presented  In  the  preceding  chapters 
represents  a  fairly  comprehensive  review  of  how  researchers  and  practitioners  have  defined  and 
measured  workload.  In  the  review,  we  have  presented  and  used  traditional  classification  schemes  (e.g., 
Hart,  1985a;  O'Donnell  &  Eggemeler,  1986;  Strasser,  1985)  for  organizing  operator  workload  techniques. 

A  considerable  amount  of  attention  was  devoted  to  explaining  and  defining  workload.  A  number  of 
definitions  of  workload  as  used  by  researchers  were  considered.  Workload  has  been  defined  in  terms  of 
(a)  the  number  of  things  to  do,  (b)  the  time  required  vs.  the  time  available  to  do  a  task,  and  (c)  the 
subjective  experience  of  the  operator.  After  considering  a  number  of  performance  issues  and  these 
workload  definitions,  we  developed  the  idea  of  a  performance  envelope  In  Chapter  2.  The  performance 
envelope  is  a  generalized  explanatory  concept;  it  contains  the  foundations  for  each  ot  the  three 
definitions  and  allows  for  variations  in  performance  within  individuals  and  between  individuals. 
Performance  can  be  described  as  a  momentary  point  in  space  within  the  performance  envelope.  We 
maintain  that  variations  in  operator  workload,  as  well  as  other  factors  depicted  in  Figure  2-2.  cause 
displacement  of  the  operator  within  the  performance  envelope.  The  proximity  of  ttie  individual's  position 
to  the  boundaries  of  the  envelope  are  indicative  of  the  relative  capacity  to  respond.  It  is  this  parameter  of 
operator  workload  that  we  deem  to  be  of  major  interest  to  system  designers. 

The  main  body  of  the  report  is  a  review  and  analysis  of  techniques  that  have  been  used  for  assessing 
OWL.  These  techniques  were  classified  Into  two  broad  categories:  The  analytical  category  which 
contains  predictive  techniques  that  may  be  applied  eany  in  system  design  without  the  operator-ln-the- 
ioop  and  the  empirical  category  which  consists  of  operator  workload  assessments  that  are  taken  with  an 
operator-in-tha-loop,  during  simulator,  prototype,  or  system  evaluations.  Tills  analytical/empirical 
dichotomy  is  an  Important  distinction  in  workload  assessment.  One  objective  of  the  present  chapter  is  to 
piovide  some  additional  discussion  on  this  distinction  as  well  as  some  additional  general  comments 
related  to  operator  workload. 

The  goal  of  workload  assessment  is  to  contribute  to  the  processes  that  ensure  acceptable  system  and 
human  performance.  It  is  useful  to  categorize  OWL  techniques  as  objective  vs.  subjective  or  primary  vs. 
secondary,  however,  such  categorizations  tend  to  emphasize  differences  that  are  independent  of  the 
goal  of  workfoad  assessment.  In  fact,  such  c<.  Unctions  may  serve  to  cloud  the  issue.  We  have  often  made 
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the  point  in  this  report  that  multiple  techniques  should  be  used  for  a  full  OWL.  assessment,  (indeed,  the 
dissociation  of  subjective  workload  assessment  results  from  empirics!  perk*  nr, a  nee  Is  ample  basis  for  this 
recomroandatior.)  A  more  general  and  possibly  more  useful  distinction  is  to  consider  operator  workload 
from  a  cause  and  effect  standpoint:  this  has  some  important  implications  lor  estimating  workload.  We  want 
to  draw  out  the  implications  by  separating  the  determinants  ot  operator  workload  from  operator  reactions 
to  that  workload.  It  will  be  argued  that  the  cause  and  effect  approach  parallels  the  anaiytical/empirical 
distinction. 

Operator  workload  research  has  fended  to  be  atheoretical  and  this  volume  has  been  oriented 
unabashedly  toward  practical  application  of  the  techniques  and  concepts  to  Army  systems.  But  there  is 
nothing  so  practical  as  a  workable  theory.  Accordingly,  part  of  our  discussion  in  this  chapter  revisits  the 
workload  model  (Figure  2-2)  end  considers  the  review  of  workload  techniques  from  a  slightly  different 
perspective  from  that  generally  presented. 

The  Determinants  of  Operator  Workload 

Factors  both  external  to  and  internal  to  the  operator  will  determine  the  extent  ot  workload.  The  external 
factors  include  job  requirements  and  job  constraints  that  detennine  the  workload  of  any  job.  The  internal 
factors  include  the  operator's  own  resources  and  capabilities.  Together,  these  (actors  form  the  basic  for 
analytical  techniques:  they  are  shown  in  Figure  9-1  and  described  below. 

Job  requirements.  Job  requirements  wit!  be  a  function  of  the  types  of  tasks  allocated  to  the 
operator  and  the  rapidity  of  occurrence  of  various  events  to  which  the  operator  must  respond.  The  types 
of  tasks  are  important  because  they  will  determine  the  kinds  of  acts  which  an  operator  must  be  able  to 
produce  while  doing  the  job.  The  rapidity  with  which  events  occur  will  determine  the  frequency  and  the 
time  available  to  produce  various  acts.  Further,  the  sequencing  and  timing  of  external  events  confronting 
the  operator  will  vary  from  moment  to  monrent.  Because  of  this,  the  workload  associated  with  a  job  will  also 
change  over  time.  The  same  observation  coukl  be  made  about  the  workload  of  a  particular  task  or  a 
particular  act.  Although  it  may  be  trje  that  a  single  set  of  values  could  be  estimated  lor  the  average 
workload  of  a  job  or  task  or  act,  it  would  be  ignoring  trie  fact  that  there  are  distrt>utions  of  workloads  ‘or 
jobs,  tasks,  and  acts. 

Job  constraints.  Job  constraints  include  the  resources  furnished  to  the  operator  (e  g.,  the  design 
of  the  workstation,  and  the  types  ot  equipment  and  supplies  the  operator  can  use  in  performing  the  job). 
For  example,  the  extent  to  which  the  workstation  has  been  engineered  for  the  human  can  impact 
significantly  the  workload  of  performing  the  job,  and  hence,  thu  difficulties  of  various  job-related  acts. 
Because  the  working  status  of  various  equipment  Rems  and  the  availability  of  supplies  and  sorvices  may 
also  vary  from  moment  to  moment,  the  job  constraints  wi!1  probably  add  to  the  overali  variability  of  workload 
and  pertonre.nce  distributions. 
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Figure  9-1 .  Illustration  of  the  determinants  of  operator  workload. 


Internal  Determinants.  Factors  within  the  operator  determining  the  extent  of  vorkload  relate  to  the 
various  internal  resources  and  capabilities  that  tho  operator  carries  Into  and  applies  to  the  )ob,  tasks,  and 
acts.  There  are,  of  course,  tremendous  individual  differences  in  the  resources  and  capabilities  of  potential 
operators.  Appropriate  selection,  placement,  and  training  of  operators  can  be  expected  to  result  in 
greater  suitability  of  internal  capabilities  and  resources  of  operators  assigned  to  a  given  job  and  also 
reduced  variability  of  individual  differences.  Performance  would  be  expected  to  improve  as  a  result  of  a 
reduction  in  workload.  However,  it  is  unreasonable  to  expect  that  all  individual  differences  can  be 
eradicated  by  these  moans  (e.g.,  Adams,  1987).  Therefore,  we  must  assume  that  even  after  appropriate 
personnel  actions  have  been  taken  operators  will  vary  in  their  capabilities  to  perform  various  jobs,  tasks, 
and  acts. 

Interactions  of  External  and  Internal  Factors.  The  extent  of  workload  imposed  on  an  operator  will 
be  a  function  of  the  external  Job  requirements  and  constraints  and  the  operator's  internal  resources  and 
capabilities.  These  determinants  differ  from  operator-to-operator,  from  day-to-day,  and  even  from 
moment-to-moment.  Perhaps  more  importantly,  tasks  can  interact  Although  it  Is  perhaps  a  poor  example 
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because  it  coukl  be  memorized,  the  »\lph«betic/numbor  interleaving  task  descried  in  Chapter  2  is  an 
example  of  detrimental  task  interaction  Because  of  these  variations,  the  estimation  of  extent  of  workload 
for  a  given  job,  or  even  a  given  mission  or  mission  segment,  is  neither  a  simple  nor  a  straightforward 
exercise.  Certainly,  workload  cannot  be  evaluated  by  considering  one  task  at  a  time  without 
understanding  the  context  of  other  current  tasks  that  will  be  required. 

arfjflaaforwi  tor  Analytical  PmcUctton  of  Opar&cr  Woriftxd 

Task  Analysis.  There  are  several  implies  i.ions  of  the  cause/effect  distinction  for  the  prediction  ot 
workload.  For  example,  enumerating  the  various  operator  tasks  can  probably  be  accomplished  with 
relative  ease.  Determining  the  frequency,  sequencing,  and  e specialty  the  interactions  ot  those  tasks  is  far 
more  difficult.  It  requires  careful  and  accurate  determination  of  the  external  events  which  will  probably 
occur  for  a  variety  of  mission  situations  as  well  as  the  careful  and  accurate  determination  of  the  probable 
external  resources  ttat  wit!  be  available  to  the  operator. 

Obtainir>a  accurate  estimates  of  the  expected  frequencies  and  sequences  of  the  operator's  tasks  is, 
however,  only  the  starting  point  for  workload  analysis.  One  of  the  goals  of  traditional  task  analysis  was  to 
arrive  at  estimates  of  the  kinds  of  acts  (e.g.,  perceptual,  cognitive,  psychomotor,  communications) 
required  by  tlx>se  tasks.  A  second  goal  was  to  arrive  at  estimates  ol  the  times  required  by  those  acts  (and 
hence,  by  the  tasks  in  which  those  acts  occur).  The  times  to  accomplish  various  perceptual  and 
psychomotor  acts  will  ultimately  depend  on  the  operator's  workstation;  accurate  estimates  of  some  times 
are  not  possible  without  first  determining  the  layout  of  the  workstation.  There  remains  much  controversy 
over  these  time  estimates  (e  g.,  Holley  &  Parks,  1987),  especially  when  they  are  obtained  in  a  subjective 
manner  and  the  estimators  are  not  required  to  specify  what  assumptions  they  have  made.  To  ameliorate 
this  problem,  most  analytical  techniques  have  recommended  using  SMEs  (e  g.,  actual  operators)  in  a 
standardized  structured  estimation  process. 

Partormanca  Models.  For  obvious  reasons,  SMEs  are  generally  preferred  over  novices.  Some  jf 
the  techniques  reviewed  In  Chapter  3  (e.g  HOS)  reduce  the  use  of  SMEs  to  (a)  describing  the  detailed 
steps  in  each  task  and  (b)  the  Hkely  design  of  the  crewstation  displays  and  controls.  HOS  assumes  that 
SMEs  can  be  relied  upon  to  arrive  at  sufficiently  valid  descriptions  of  this  type  of  information,  but  it  does 
not  make  the  additional  assumption  that  SMEs  are  necessarily  good  svaluatois  of  either  the  acts  or  the 
times  that  will  be  required  by  the  detailed  steps  in  each  task.  Indeed,  we  know  that  operators  do  not  do 
well  in  these  regards  (Spady.  1978a).  Instead,  HOS  ratios  on  a  fairly  complex,  computerized  human 
operator  model  to  accomplish  those  latter  functions. 

Determining  accurate  estimates  of  which  operator  acts  will  be  required  and  when  they  will  be  required 
ate,  themselves,  complex  problems.  But,  In  reality,  they  are  merely  the  beginning  steps  for  predicting 
human  performance  and  est‘mating  operator  workload.  tn»  time  needed  to  perform  the  acts  and  the 
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accuracy  with  which  those  acts  can  be  accomplished  will  be  a  function  of  the  capabilities  and  resources  of 
each  individual  operator.  Thus,  one  can  expect  a  distribution  of  relative  workloads  for  different  operators 
just  as  one  can  expect  a  distrtx/tion  of  relative  workloads  for  different  missions  and  scenarios. 

The  time  and  accuracy  to  perform  any  given  act  will  be  dynamically  changing  from  moment  to  moment. 
Because  the  ultimate  measure  needed  for  workload  Is  related  to  how  close  the  individual  is  coming  to  the 
edge  of  his  acceptable  performance  envelope,  estimating  workload  remains  a  highiy  challenging  problem. 


Measuring  Operators'  Reactions  to  Workload 

The  preceding  section  contained  discussions  about  the  various  determinants  and  causes  of  workload, 
on  the  analytical  side  of  workload  analysis,  in  tlvat  approach,  one  examines  the  conditions  under  which  an 
operator  will  be  required  to  perform  tasks  and  on  that  basis,  arrives  at  estimates  of  what  the  performance 
and  workload  should  be.  On  the  empirical  side,  we  examine  various  types  of  operator  reactions  to 
performing  a  job  and,  based  on  those  reactions,  arrive  at  estimates  of  what  the  workload  must  have  been. 
In  this  section,  therefore,  we  will  discuss  the  various  outcomes  and  effects  resulting  from  workload. 
However,  rather  than  use  the  more  traditional  four-level  taxonomy  presented  earlier,  it  is  suggested  that 
the  measurement  of  outcomes  of  workload  can  be  described  more  parsimoniously  in  tripartite  terms  of  the 
operators'  (a)  job-related  acts  (primary  and  secondary  task  measures),  (b)  concomitant  physiological 
changes,  and  (c)  subjective  reactions  engendered  as  the  operator  attempts  to  perform  the  assigned  job. 
Figure  9-2  illustrates  these  effects  of  operator  workload. 

Job-related  Acts.  Job-related  acts  are  synonymous  with  job  requirements  and  required  acts 
described  in  the  discussion  of  workload  determinants.  Estimating  when  certain  acts  occur  can  be  done 
using  either  subjective  or  objective  techniques.  Some  of  these  acts  are  observable  and  can  be  measured 
in  highly  objective  fashion.  For  example,  head  and  eye  movements  and  visual  fixation  can  be  determined 
objectively  by  measurement  of  the  eye.  Umb  movements,  grasping  and  manipulating  control  devices, 
and  use  oi  speech  for  messages  can  be  also  measured  end  timed  very  accurately  by  the  primary  and 
secondary  task  measurement  techniques.  However,  not  all  human  job-related  acts  are  directly 
observable.  EEG  data  indicate  that  internal  events  are  continually  occurring  within  the  brain,  but  they 
provide  little  information  on  what  specific  events  are  occurring.  To  a  large  extent  ECPs  and  the  variability 
in  heart  rate  may  be  considered  as  preliminary  attempts  to  measure  indirectly  the  occurrence  of  those 
various  internalized  activities  which  are  not  directly  obsurvabie. 

Physiological  Changes.  Physiological  changes  represent  the  second  typs  '»f  reaction  to  the 
workload  an  operator  confronts.  There  are  twe  broad  subtypes  of  physiological  changes.  Momentary  and 
long  term.  The  momentary  changes  are  represented  by  ECP  techniques,  pupil  responses,  eye 
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Figure  9  2  t  he  measurable  reactions  to  workload.  The  essimment  techniques  are  shown  in  the  shaded 
areas 

movements,  and  the  like.  These  momentary  changes  have  been  well  documented.  There  also  is  little 
question  in  the  long-term  that  a  variety  of  biochemical  byproducts  are  generated  as  the  operator  goes 
about  pertorming  a  job  The  increase  or  decrease  of  certain  chemicaSs  In  the  body  may  be  related  to  the 
depletion  and  recovery  ot  some  ot  those  resources  needed  to  perform  various  acts.  What  we  are 
speaking  oi  in  the  long-term  case  are  not  the  various  physiological  Indicators  that  a  task-related  ad  has 
occurred  but  rathe*  the  concomitant  physiological  changes  occurring  because  internal  resources  are 
being  depleted  Several  oi  ihese  types  of  physiological  ctianges  are  discussed  In  Chapter  7.  Because  of 
the  complexity  of  the  varying  time  delays  involved  In  the  undertying  physiological  processes,  the 
detectable  chafes  have  little  diagnostic  value  at  present  lor  determining  precisely  which  acts  were 
related  to  those  changes 

Operator  Experiences,  subjective  experiences  are  the  third  type  oi  reaction  to  the  workbad 
confronting  an  operator  We  know  that  some  types  oil  experiences  ( e.g.,  anxiety,  fear,  fatigue,  confusion, 
frustration,  anger,  faille)  encountered  in  work  situations  are  correlated  with  various  concomttant 
physiological  changes  We  also  know  tha'  some  types  of  experiences  may  be  correlated  with  ttie 
operators  level  of  performance  Thus,  there  is  an  obvious  overlap  between  an  operators  reactions  to 
workload  we  classify  as  an  experience  and  the  other  tvno  workload  effect  categories.  However,  the 
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overlap  to  not  parted  and  sometimes  the  correlation  to  even  negative.  This  hat  been  called  dissociation 
by  various  investigators  (eg.,  Derrick.  1868;  Yah  &  Wtokene,  1884). 

A  related  problem  is  that  the  kinds  of  experience  frequently  reported  by  workload  researchers  are 
correlated  among  trio  mss  Ives  Thus,  it  to  likely  that  a  person  who  reports  experiencing  confusion  may  also 
report  feeing  f  rustrated  as  well.  H  experiences  are  important  effects  of  workload,  we  need  to  isolate  their 
independent  dimensions.  Subjective  experiences  are  also  likely  to  undergo  continuous  changes  during 
a  mission.  If  one  waits  until  the  end  of  a  mission  to  collect  data  on  subjective  experiences,  ft  is  possible 
that  earlier  experiences  may  have  been  forgotten.  It  to  aiso  Hkety  that  subjective  reports  wtft  be  influenced 
by  the  perceived  outcomes  of  the  mission.  For  example,  when  a  mission  to  ultimately  successful, 
operators  may  tend  to  play  down  the  Importance  of  an  airier  foaling  of  confusion.  By  the  same  token,  ft 
the  outcome  of  e  situation  to  a  falure,  the  operator,  even  though  he  never  actually  felt  that  way  during  the 
mission,  may  now  recognize  that  ha  musi  have  been  contused.  This  could  lead  to  the  operator  reporting 
confusion  even  though  he  never  really  felt  that  way. 

A  final  point  is  that  information  to  sometimes  solicited  under  the  guise  of  the  operator's  subjective 
experiences  when,  in  reality,  the  information  we  want  is  the  operator's  evaluation  and  judgment 
concerning  the  goodness  of  the  design  of  the  system.  A  relevant  question,  then,  is  whether  operators 
would  arrive  tit  the  same  conclusions  about  a  system  design  without  ever  asking  about  their  experiences. 
From  a  designer's  point  of  view,  It  wM  almost  always  be  more  informatiw  to  know  specifics  about  tasks  or 
components  of  the  system  with  which  operators  experience  difficulty,  than  just  to  know’  that  he 
experienced  an  overload.  Addtttonai  useful  and  dtognostic  information  can  be  ccfected  by  eliciting  direct 
informal  ion  from  operators  about  few  the  system  might  be  improved. 

77m  ftotmUom  between  th»  D*t*mlnent»  and  Effects  of  Watktoeti 

it  should  be  possible  to  estimate  the  extent  of  workload  by  examining  either  the  determinants  or  the 
effects  of  the  workload.  For  a  given  situation  and  Individual,  one  would  expect  the  two  approaches  to 
yield  similar  answers  to  the  question  of  what  the  extent  of  workload  was.  Indeed,  ft  to  entirely  posable  that 
they  will  not.  The  major  reason  can  be  described  In  terms  of  the  parallel  distinction  between  analytical  and 
empires!  techniques  and  the  capability  of  predicting  vs.  measuring  performance.  The  most  obvious 
overlap  between  the  two  approaches  lies  in  the  area  of  job-related  tasks  and  their  performance  by  a  given 
operator  That  to,  a  full  understanding  of  the  task  demands,  situational  constraints,  and  capabilities  find 
Hmltatiorw  of  a  given  operator  should  yield  acceptable  predictions  of  what  acts  will  occur,  when  they  will 
occur,  end  how  well  they  will  be  executed  in  that  situation.  This  desired  agreement  between  predicted 
and  actuai  acts  is  one  of  the  major  goals  of  the  development  of  human  performance  prediction  models  ft 
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may  be  argued  mat  validation  of  «N  of  the  predicted  acts  cannot  be  obtained  because  some  acts  are  simply 
not  observable.  There  is,  however,  a  partial  answer  to  this  objection.  It  there  is  acceptable 
correspondence  among  the  predicted  and  actual  observed  acts  and  events,  then  the  model  is  probably 
accounting  for  the  times  required  by  the  unobservable  Acts. 

The  determinants  approach  to  assessing  OWL  suggests  that  various  internal  resources  must  be  being 
depleted  as  acts  take  place.  The  effects  approach  suggests  that  some  oi  the  detectable  physiological 
changes  might  be  indicative  of  the  fatigue  and  recovery  of  those  same  internal  resources.  Thus,  there  is  a 
second  way  in  which  the  two  approaches  might  be  shown  to  correspond,  it  might  well  be,  however,  that 
some  internal  physiological  changes  cannot  be  detected  or  observed  with  the  present  technology. 
Theoretically,  the  depletion  and  recovery  of  various  internal  resources  are  responsible  for  changes  in  the 
level  of  performance  of  the  job-related  acts.  Thus,  if  the  human  performance  model  correctly  predicts 
when  the  performance  of  various  acts  win  degrade  or  improve,  then  it  can  be  assumed  that  the  depletion 
and  recovery  of  internal  resources  is  being  accounted  for.  Current  human  performance  models  do  not 
include  provisions  for  the  depletion  and  recovery  of  internal  resources  needed  for  the  production  of 
various  acts.  However,  there  is  nothing  that  prevents  that  concept  from  being  Included. 

The  determinants  approach,  unlike  the  effects  approach,  falls  to  consider  an  operators  internal 
experiences.  If  some  of  the  subjective  experiences  are  considered  to  be  the  results  of  the  operators 
perceptions,  then  they  also  could  be  modeled  For  example,  anxiety  and  fear  could  be  assumed  to  occur 
when  the  simulated  human  assesses  the  situation  in  which  he  finds  himself  as  being  threatening. 
Confusion  could  be  assumed  to  occur  when  the  simulated  operator  cannot  solve  problems  as  rapidly  as 
he  norma:ry  can  or  when  processed  information  (perceived  and/or  recalled)  is  round  to  be  contradictory. 
Feelings  of  physical  or  mental  fatigue  could  be  assumed  to  occur  when  the  corresponding  acts  have  been 
required  over  a  sustained  period  of  time.  Even  feelings  of  being  overloaded  could  be  assumed  to  occur 
whenever  the  model  of  human  performance  indicates  a  recognition  of  job  demands  outpacing  available 
time.  Thus,  It  is  conceptually  feasible  (though  a  major  undertaking)  to  construct  human  performance 
models  that  would  also  include  the  generation  of  subjective  experiences  as  well  as  the  performance  of  the 
assigned  tasks.  As  R  would  have  to  include  various  introspective  tasks  along  with  the  job-related  ones, 
such  a  model  would  be  more  ooraplex  that  existing  ones.  It  might  be,  however,  that  ultimately  such 
models  would  be  more  accurate  in  predicting  actual  job  performance  because  they  could  account  tor  the 
Internal  motivations  of  the  operator. 
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Th*  cause  (determinant)  and  ?rttect  (reaction)  approach  Is  nothing  more  than  Hooking  at  two  tildes  ot  the 
same  coin.  Researchers  have  done  this  Implicitly  m  devetoping  and  applying  analytical  techniques  by 
often  using  data  from  empirical  techniques.  But  they  have  simply  not  gone  far  enough.  Too  few  ot  tine 
analytical  techniques  have  been  sufficiently  validated.  Without  thorough  validation,  we  do  not  know  if  we 
have  a  good,  practical  technique  or  an  interesting,  untested  theory.  The  lack  of  validated  analytical 
information  on  OWL  suggests  looking  at  both  sides  of  the  coin.  As  a  result  of  converging  operations 
(Chapter  2)  a  clearer  picture  cart  be  obtained  from  several  uncertain  views.  This  is  the  predominant 
rationale  for  our  advocacy  of  OWL  technique  batteries. 

Future  Directions 


Having  considered  virtually  every  workload  assessment  technique  and  thereby  having  obtained  a  rare 
overall  perspective,  we  would  be  remiss  not  to  highlight  several  gaps  in  the  technologies  and  content  of 
workload  research.  Of  course,  many  things  need  to  be  done  to  create  more  applicable  and  validated 
workload  tools  and  techniques.  Analytical  techniques  in  particular  represent  an  area  containing  many 
technological  gaps.  This  has  been  considered  in  the  discussion  of  Chapter  3  as  well  as  In  this  chapter. 
Two  major  areas  ot  OWL  research  that  need  further  study  are:  multiple  task  performance  and  individual 
differences.  Each  of  these  topics  not  only  has  impact  on  operator  workload  evaluation,  but  also  on 
performance  and  other  areas  of  MANPRINT  ooncems. 

Estimating  Workload  tor  MuMpto  Tasks  and  Mutpis  Stustions 


A  good  deal  of  the  laboratory  workload  research  has  dealt  with  single  or  dual  ta«*k  experiments 
occurring  within  a  single  or  possibly  two  different  sets  of  task  conditions  or  situations.  Although  the 
results  of  that  research  have  been  interesting,  most  Army  jobs  of  interest  have  multiple  ongoing  tasks. 
Further,  as  pointed  out  earlier,  even  though  the  nature  of  an  operator's  tasks  may  be  similar  from  one  day 
to  the  next,  the  relative  difficulty  of  the  various  mission  situations  confronted  may  change  significantly  on  a 
day-to-day  basis.  Figure  9  3  illustrates  the  two  dimensions  ot  number  of  concurrent  tasks  and  number  of 
different  situations  in  which  the  tasks  occur.  This  figure,  of  course,  is  an  abstraction.  In  reality  separating 
discrete  elements  of  tasks  or  situations  may  be  difficult. 
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Figure  9-3.  Relation  of  number  of  tasks  and  number  of  situations. 

Despite  the  fact  that  most  empirical  data  collected  on  operator  workload  come  from  the  lower  left  cells  of 
Figure  9-3,  the  major  interest  for  future  workload  research  will  be  the  upper  right  multitask,  m  jltisituation 
cell.  In  many  experiments,  subjects  have  been  required  to  perform  only  a  single  task.  In  this  case,  it  is 
relatively  easy  to  determine  how  well  the  subject  has  performed  the  job.  When  a  second  task  Is  added  to 
the  job,  as  in  the  case  of  dual-task  studies  for  example,  It  is  much  less  dear  how  one  should  evaluate  the 
overall  performance  of  the  Job.  Yet,  it  is  clear  that  job  performance  and  operator  workload  cannot  be 
evaluated  by  examining  only  the  performance  on  the  primary  task  Ignoring  time-sharing  requirements. 
Rather,  overall  performance  on  all  ongoing  tasks  must  be  considered  In  arriving  at  estimates  of  workload. 
How  this  is  to  be  done  is  one  ot  the  most  challenging  issues,  not  only  for  workload  assessment,  but  also 
for  overall  performance  prediction  and  evaluation. 

An  advantage  to  collectiig  data  on  multiple  situations  Is  that,  not  only  can  measures  of  interesting 
parameters  for  each  task  be  obtained  for  each  situation,  but  they  can  be  compared  across  the  various 
situations.  For  example,  our  discussions  in  earlier  chapters  indicated  that  changes  in  the  frequency  or 
extent  of  certain  kinds  of  acts  from  one  period  to  another  might  be  indicative  of  the  operator  applying  a 
different  set  of  performance  rules  for  the  same  acts  In  different  situations.  The  technique  of  adding  a 
secondary  task  can  best  be  understood  In  the  context  of  changing  the  situation  In  order  to  see  how  it 
affects  performance  on  all  of  the  other  tasks.  The  issue  c?  crediting  the  impact  of  additional  tasks  on 
overall  job  performance  Is,  of  course,  central  to  the  whole  problem  of  allocating  tasks  to  an  operator.  The 
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concept  of  manipulating  me  dtfltcutty  ot  any  task  fc,  tee  how  N  wilt  impact  overall  job  performance  is  also  a 
useful  technique.  Adding  new  tasks  or  increasing  the  dWIcuKy  ot  existing  tasks  are  alternative  techniques 
for  determining  the  location  ot  the  operator  in  his  performance  envelope.  In  the  process  ot  evaluating 
operator  workload,  we  do  not  yet  tuly  understand  where  the  boundaries  of  acceptable  workload  are  (or 
the  human.  Incremenialy  adding  to  the  workload  until  performance  begins  to  deteriorate  or  to  breakdown 
is  similar  to  methods  used  in  the  physical  sciences  for  testing  the  tensile  strength  ot  various  materials. 

We  have  stressed  the  importance  ot  not  only  estimating  what  operator  performance  win  be  under  a 
variety  of  expected  mission  scenarios,  but  also  knowing  how  dose  K  will  come  to  the  boundaries  of 
unacceptable  performance  during  those  missions  -  even  If  every  operator's  performance  were  completely 
acceptable.  The  purpose  in  doing  this  is  to  understand  better  the  margin  of  error  in  a  proposed  design  of 
a  new  system.  That  margin  of  design  error  is  especially  important  because  future  demands  of  any  job  may 
well  be  more  difficult  than  originally  anticipated  and  new  tasks  may  have  to  be  added  to  counteract 
technological  advances  or  doctrinal  and  tactical  changes  in  the  employment  of  hostile  forces. 

The  application  of  improved  knowledge  about  performance  in  multitask  situations  will  clearly  benefit  the 
system  designer  and  impact  not  only  workload  evaluation  but  aiso  »  variety  ot  MANPRINT  issues.  The 
designer  will  benefit  by  being  able  to  improve  designs  and  optimize  task  allocation.  The  trainer  will  benefit 
by  having  a  better  understanding  of  performance  rules  and  which  components  need  more  emphasis.  The 
trainer  will  also  benefit  by  being  able  to  teach  time-sharing  skills.  The  suggested  approach  is  clearly 
interdisciplinary.  The  need  tor  considering  diverse  sets  of  data  from  neuropsychology,  from  individual 
difference  research,  from  performance  research,  from  human  modeling  and  artificial  intelligence,  and  from 
mathematical  modeling  is  simply  too  much  for  any  single  researcher  to  master. 

Atttntlon  and  Switching  Among  Tasks.  A  general  conclusion  from  this  review  is  that  a  full 
understanding  of  operator  workload  will  begin  to  emerge  only  when  sufficient  workload  investigations 
have  emphasized  multiple  tasks  and  multiple  situations.  The  suggestion  of  looking  at  multiple  tasks  and 
multiple  situations  is  a  general  plea,  however,  and  we  can  be  more  specific.  Because  multitask  situations 
are  common  occurrences  for  an  operator  currently  and  may  well  become  even  more  common,  we  need 
more  information  and  data  about  multitask  performance  and  an  understanding  of  the  relations  among  and 
impact  of  individual  tasks  on  multitask  performance,  in  particular,  the  issue  of  time-sharing  abilities  comes 
into  focus  in  this  oontext. 

The  importance  of  the  interactions  of  two  or  more  tasks  on  performance  cannot  be  overestimated. 
Mental  workload  often  increases  when  two  or  more  tasks  ere  to  be  performed  concurrently.  This  is 
certainly  not  surprising  from  several  different  theoretical  approaches.  We  prefer  an  explanation  involving 
attention  switching  for  the  following  reasons.  First,  if  one  assumes  that  operators  can  consciously  attend 
to  only  one  thing  at  a  time,  the  multitask  situations  require  operators  to  decide  which  task  should  be 
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to  only  one  thing  at  a  time,  the  multitask  situations  require  operators  to  decide  which  task  should  be 
attended  to  at  various  points  in  time.  This  additional  mental  task  clearly  does  not  exist  in  single  ti'vk 
situations.  Such  decisions,  especially  in  rapidly  unfolding  combat  situations,  are  tar  from  trivial.  The 
operator  may  well  feel  tom  between  working  on  several  critical  tasks,  each  of  which  is  currently  demanding 
his  attention.  Second,  especially  when  tasks  are  considered  to  be  approximately  of  c*ual  importance, 
multitask  situations  may  result  in  frequent  interruptions  of  the  current  task  to  determin the  need  to  work 
on  the  others.  Even  if  the  task  is  considered  the  most  critical,  the  time  and  effort  to  evaluate  the  status  of 
the  other  tasks  takes  mental  time  and  effort.  Third,  the  Interruption  of  a  task  means  tiat  some  time  elapses 
during  which  the  interrupted  task  is  not  consciously  attended  to.  When  the  operator  returns  to  the 
interrupted  task,  he  may  be  surprised  at  what  he  now  finds.  The  actions  required  may  be  somewhat 
different  from  what  he  had  anticipated  thus  requiring  yet  additional  mental  effort.  Finally,  the  continual 
switching  among  several  tasks  may  require  some  additional  time  to  reestablish  the  contents  of  the 
operator's  working  memory  with  the  current  situational  data  and  the  procedural  rules  tor  the  task  currently 
being  worked  on. 

There  are  a  number  of  experiments  supporting  the  attentional  switching  conceptualization.  Mewhort, 
Thio,  and  Birkmayer  (1971)  used  dichotic  listening  and  showed,  independently  of  other  factors,  that  the 
number  of  required  switches  had  a  dramatic  impact  on  recall.  In  a  different  context,  Weichselgartner  and 
Sperling  (1987)  concluded  that  attention  consists  of  two  partially  concurrent  processes.  One  is  a  fast, 
effortless,  automatic  process  (on  the  order  of  100  ms)  and  the  other  Is  slower,  effortful,  controlled  process 
(on  the  order  of  300-400  ms).  The  faster  process  is  affected  by  manipulations  often  considered  as  parallel 
processe  s  that  are  independent  of  task  difficulty  while  the  slower  is  affected  by  variables  typically 
considered  as  serial  processes  and  related  to  task  difficulty  as  well  as  training  and  practice. 

Although  data  bearing  on  attention  and  switching  problems  are  available  from  dual  task  investigations, 
many  of  the  experiments  reported  in  the  literature  have  used  tasks  (both  primary  and  secondary)  that  have 
tittle  similarity  with  real  world  tasks.  Attentional  decisiors  as  to  which  of  the  two  tasks  to  work  on  in  those 
situations  are  trivial  when  compared  to  most  rent-world  situations  of  interest.  Indeed,  conclusions  drawn 
from  those  types  of  investigations  may  simply  be  irrelevant  to  the  real  problems  confronted  in  designing 
complex  human-machine  systems.  Our  earlier  statement  that  future  experiments  should  investigate 
multiple  tasks  performed  in  multiple  situations  or  under  multiple  conditions  includes  the  collection  of 
relevant  data  for  understanding  attention  switching  problems. 

The  Need  for  New  Metric *.  As  more  realistic  mu  it  (task  muttisituations  are  investigated,  the  issues 
of  performance  and  workload  trade-off  and  how  they  can  be  handled,  will  become  more  and  more 
apparent  and  more  pressing.  New  metrics  are  needed  to  facilitate  more  precise  predictions  about  the 
trade-offs.  One  metric  proposed  is  the  performance  operating  characteristic  (POC)  (Norman  &  Bobrow, 


200 


1975).  The  POC  is  a  way  of  representing  tto  data  obtained  from  two  tasks  done  individually  and  In 
combination.  Under  the  muttlpie  resource  theory  (Navon  &  Gopher,  1979),  Wickens,  Mountford,  and 
Schreiner  (1981)  have  developed  a  normalization  scheme  which  they  claim  provides  such  a  metric. 
Essentially,  their  recommendation  Is  to  normalize  dual  task  performance  to  single  task  performance. 
Kantowitz  and  Weldon  (1985),  howevor,  have  shown  some  of  the  dangers  of  using  such  a  procedure. 
Through  simulation,  they  have  shown  that  erroneous  conclusions  could  be  drawn  from  the  application  of 
such  a  transformation.  Further  clarification  has  been  offered  by  Wickens  and  Yah  (1985).  Until  these 
issues  are  settled,  the  POC  may  be  a  useful  way  to  present  data  but  its  application  should  be  extended 
only  with  care. 

It  may  turn  out,  as  suggested  by  Rachel  la  (1974),  that  speed/accuracy  tradeoffs  and  other  similar 
performance  trade-offs  should  be  handled  with  weaker  scales  of  measurement  (e.g.,  ordinal)  and  not  the 
interval  scales  currently  attempted.  Other  mathematical  techniques  may  also  be  useful  such  as 
correlation,  conjoint  measurement,  and  factor  analysis.  Regardless  of  the  ultimate  nature  ot  the  metrics 
needed,  it  is  obvious  that  much  work  currently  remains  to  resolve  this  problem. 

Individual  Differences,  Performance,  and  Workload 

Our  account  started  in  Chapter  1  with  a  quote  from  a  little  book  that  dealt  with  testing  for  individual 
differences  and  it  is  fitting  to  close  with  some  comments  on  the  same  topic.  The  usefulness  of  personnel 
testing  during  World  War  I  was  clearly  demonstrated  and  documented  in  ttiat  1920  reference;  such  testing 
has  continued  to  the  present.  Although  such  Information  Is  highly  useful.  It  is  useful  only  in  a  broad 
sense.  It  does  not  provide  the  detailed  information  needed  to  predict  operator  performance  precisely. 
We  can  ill  afford  to  build  a  system  and  then  determine  whether  soldiers  can  operate  it.  To  use  analytical 
techniques  in  a  more  beneficial  manner  to  determine  performance  before  the  system  is  built,  it  is 
necessary  to  have  a  considerable  amount  of  detailed  information.  In  many  ways,  this  approach 
compliments  the  multitask  approach. 

It  is  not  that  individual  differences  have  been  ignored  in  the  experimental  literature,  they  have  been 
consciously  set  aside  in  favor  of  examining  population  means.  (This  is  by  no  means  a  new  point  [e  g., 
Noble.  1961 ;  Ozier,  1980].)  About  the  only  Information  available  from  many  experimental  reports  related 
to  individual  differences  is  the  Information  contained  in  the  error  terms  and  the  subject  term  of  analysis  of 
variance  or  in  the  means  and  standard  deviations  which  are  sometimes  presented.  Further,  much  of  the 
research  conducted  in  university  laboratories  has  deal!  with  a  highly  restrictod  population.  Accordingly, 
even  if  information  was  presented  about  individual  differences,  one  usually  doss  not  have  information 
about  differences  for  the  population  of  Army  operators.  While  most  invesligators  support  the  importance 
of  individual  differences,  one  can  find  only  a  few  volumes  (e  g.,  Eysenck,  1 977;  Miles,  1936)  that  deal  with 
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individual  variations  In  an  experimental  context  arc)  provide  data  needed  for  the  analytical  techniques. 
Fortunately,  the  situation  is  not  as  stark  in  other  areas  of  research. 

Any  consideration  of  individual  differences  and  mental  workload  lands  one  squarely  in  the  domain  of 
Intelligence.  Many  years  ago,  the  neuropsychologist  Karl  Lashley  (1929)  outlined  several  major 
theoretical  positions  on  intelligence.  Two  of  these  general  theories  are  extant:  the  general  plus  special 
abilities  theory  (e.g.,  Spearman,  1927)  and  the  specific  abilities  theory  (e.g..  Thurstone,  1938).  The 
general  theory  holds  that  there  exists  one  general  intelligence  factor  plus  some  ability  specific  to  the  test 
used.  By  contrast,  the  specific  theory  holds  that  intelligence  is  the  algebraic  sum  of  a  number  of  diverse 
capacities.  There  are,  of  course,  many  variations  of  these  two  classes  of  theory:  the  theory  subscribed  to 
can  have  tremendous  Implications  tor  the  approach  taken  toward  individual  difference  research.  Much  of 
the  research  in  individual  differences  in  abilities  has  utilized  factor  analytical  approaches. 

Emphaala  on  Underlying  Acta.  Akin  to  traditional  factor  analysis  are  several  theories  and 
approaches  which  emphasize  the  various  capacities  and  resources  that  underlie  performance  on  many 
different  tasks.  Work  on  attention  has  emerged  from  information  processing  and  cognitive  theories  about 
behavior  Coupled  with  the  attention  work  is  the  evaluation  and  identification  ot  mental  acts  involved  in 
performance.  Typically  these  approaches  have  not  focused  on  individual  differences  but  there  is  no 
reason  why  they  cannot  be  extended.  Navon  and  Gopher  (1979)  postulated  that  different  amounts  and 
types  of  resources  are  required  for  different  task  combinations  (cf.  Navon,  1984).  Wickens  (1984), 
building  on  the  information  processing  approach,  has  formulated  this  idea  into  a  relatively  few  general 
dimensions  (e.g.,  verbal,  spatial,  auditory,  visual,  speech,  motor)  to  deal  with  the  multiple  task  problem. 
There  are  a  number  of  other  ways  of  examining  behavioral  detail  associated  with  mental  acts.  Researchers 
have  generated  a  considerable  amount  of  data  relevant  to  the  issue  using  a  number  of  approaches;, 
including  perceptual  processing  (Gamer,  1974),  brain  damage  (Lurta,  1966),  skill  learning  (Adams,  1987), 
and  the  nature  of  intelligence  (Guilford,  1967). 

In  his  Underlying  Internal  Processes  (UIPs)  theory,  Wherry,  Jr.  (1986)  emphasizes  individual 
differences  and  postulates  that  most  differences  in  task  performance  are  attributable  to  the  number  of 
times  different  UIPs  must  be  invoked  for  a  given  task  and  l>ow  fast  and  how  accurately  different  UIPs  are 
performed  by  those  individuals.  Based  on  the  established  moment  to  moment  reliabilities  of  task 
performance,  he  maintains  that  the  time  and  accuracy  of  given  UIPs  within  individuals  must  also  remain 
fairly  constant.  He  presents  a  metnodotogy  by  which  the  number  and  nature  of  the  different  UIPs  required 
for  given  tasks  can  be  identified  by  the  analysis  of  the  correlations  among  task  completion  times  across 
many  variations  of  the  task  of  interest.  His  analysis  also  permits  the  estimation  of  the  individuals’ 
capabilities  for  the  identified  UIPs. 

These  approaches  have  much  in  common.  They  share  with  traditional  factor  analysis  approaches  to 
inteliigenco  research  the  concept  that  individual  differences  in  task  performance  are  attributable  and 
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explainable  by  understanding  the  differences  In  humans'  capability*  to  perform  various  kinds  of 
underlying  mental  acts.  As  such,  they  also  share  much  In  common  with  tne  required  acts  as  explained  in 
our  recasting  of  analytical  and  empirical  workload  estimation  approaches.  Thus,  we  also  conclude  that 
mu.:  i  .ituid  aiiyntion  must  be  paid  to  quantifying  individual  differences  in  underlying  capacities  if  workload 
estimation  is  to  progress  to  a  mature  and  useful  technology. 

Skills  end  Performance  Rules.  In  addition  to  the  approaches  already  mentioned,  there  are  several 
other  directions  individual  difference  research  m  ght  take.  One  of  these  is  in  the  acquisition  of  skill  and 
how  tasks  and  acts  become  automated.  Tne  other  is  In  the  performance  rule/strategy  domain.  Individuals 
differ  nut  only  In  their  underlying  capabilities,  but  also  »n  the  knowledge  they  may  bring  to  bear  on  various 
problems.  Ozier  (1980)  has  shown  clearly  the  rola  of  performance  rules  In  free  recall.  The  differences 
found  are  quite  striking  even  with  a  restricted  population  of  college  students.  Ozier  suggests  that  these 
organizational  rules  (or  strategies)  are  Independent  of  scores  on  several  Intelligence  tests.  Whether 
application  of  performance  rules  is  independent  of  or  related  to  general  abilities  is  of  tremendous  practical 
importance  to  operator  workload  Issues.  Independence  implies  no  predictability  and  without  predictability, 
all  of  our  performance  models  are  inadequate. 

Project  A.  Yet  another  approach  to  individual  differences  is  represented  in  Project  A.  This  is  a 
large  scale  program  undertaken  by  the  Army  Research  Institute  to  supplement  the  ASVAB  for  the 
purpose  of  Improving  the  prediction  of  successful  performance  In  both  training  and  on  the  job.  Peterson 
(1985)  provides  an  overview  of  the  steps  taken  to  develop  additional  personnel  tests.  The  overall 
program  involves  not  only  developing  new  tests  but  also  validating  these  tests  against  criterion  soldier 
performance.  When  completed,  the  database  will  contain  a  considerable  amount  of  information  of 
relevance  to  performance.  Of  particular  Importance  is  the  fact  that  some  of  the  new  tests  being  developed 
are  computer-based  performance  tests  in  which  information  being  evaluatod  by  subjects  can  be 
dynamically  changing  and  performance  patterns  and  performance  times  can  be  measured.  Thus,  it  may 
be  possible  to  assess  underlying  processe  s  not  testable  by  the  typicar  paper  and  pencil  methods. 

And  Finally,  It  really  Is  an  Elephant! 


It  is  difficult,  perhaps  impossible,  to  summarize  a  volume  of  this  size  in  a  few  well  chosen  sentences. 
The  reviews  of  workload  definitions,  techniques,  and  approaches  represent  a  massive  effort  rarely 
undertaken.  After  having  the  opportunity  to  examine  the  great  diversity  of  those  definitions,  approaches, 
and  workload  estimation  techniques,  we  have  been  struck  by  tite  fact  that  we  have,  indeed,  been  looking 
at  the  same  elephant. 

One  objective  ol  this  final  chapter  was  to  provide  a  further  synthesis  of  the  materials  from  the  preceding 
chapters.  We  have  attempted  to  illustrate  and  clarify  the  very  real  overlap  between  traditional  analytical  and 
empirical  workload  techniques  with  our  discussion  of  the  causes  and  effects  of  workload. 
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A  second  objective  was  to  assess  briefly  past  research  effort  and  indicate  future  directions  that  will 
strengthen  the  body  of  knowledge  upon  whicn  more  coherent  and  encompassing  models  of  operator 
woikioad  can  be  constructed.  To  this  end,  we  have  advocated  a  much  greater  emphasis  on  multitask  and 
multisituation  investigations  as  well  as  greatly  expanded  interest  in  individual  differences. 

A  final  objective  of  this  chapter  was  to  emphasize  that  determining  the  extent  of  workload  Is  not  an  end 
in  itself.  It  is,  however,  a  necessary  step  In  determining  the  position  of  the  operator  within  their  own 
performance  envelopes  which  In  conjunction  with  the  nearness  of  the  boundaries  of  those  envelopes, 
provides  an  indication  of  the  operator's  momentary  relative  capacity  to  respond.  It  is  this  parameter,  more 
than  any  other,  that  system  designem  require  U  they  are  to  build  adequate  man-machine  systems. 

The  importance  of  understanding  the  level  of  operator  workload  is  clear:  High  workload  may  result  in 
unexpected  and  undesirable  performance  changes.  The  operator  may  shed  tasks,  be  unable  to  perform 
them,  or  in  some  othe.-  way  fail  to  perform  acceptably.  In  one  form  or  another,  rightly  or  wrongly,  the 
operator  will  adapt.  Without  such  consideration,  the  incorporation  of  MANPRINT  concerns  into  the  design 
of  systems  wiM  continue  to  be  problematical. 
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APPENDIX  A.  UTERATURF.  REVIEW  OF  SECONDARY  TASKS 


The  approach  taken  to  review  the  vast  secondary  task  literature  was  to  Identify  any  relationships 
that  may  exist  between  secondary  task  characteristics  and  primary  task  characteristics.  That  is,  we 
classified  the  types  of  secondary  tasks  and  primary  tasks  that  have  been  reported  in  the  literature.  We 
then  examined  the  results  of  such  studies  based  on  the  various  secondary  and  primary  task 
configurations.  Our  reasoning  behind  this  effort  was  to  address  a  pragmatic  question  that  other 
researchers  have  recognized  as  being  very  important  but  overlooked  (Chiles  &  Alkiisi,  1979).  That  is,  are 
operators  capable  of  performing  two  different  tasks  concurrently?  To  illustrate,  if  either  the  primary  or 
secondary  task  exhibits  a  decrement  In  performance  when  they  are  performed  jointly,  this  finding,  at  the 
very  least,  suggests  that  operators  find  It  difficult  to  perform  such  a  dual  task  configuration.  Even  though 
such  findings  may  violate  methodological  assumptions  needed  to  draw  inferences  about  spare  capacity,  it 
provides  valuable  information  about  the  ability  of  operators  to  exhibit  time-sharing  ability  between  two 
tasks.  (See  Gopher  and  Donchin  {1966]  for  an  overview  ot  the  literature  that  specifically  addresses  time¬ 
sharing  ability.)  To  our  knowledge,  there  have  been  no  published  reports  that  have  followed  such  a 
scheme  for  the  secondary  task  literature.  Ogden  et  al.  (1979)  have  provided  a  basis  for  such  an  analysis, 
but  ihey  did  not  actually  complete  the  analysis. 

We  recognize  that  even  though  the  results  from  this  analysis  may  suggest  that  certain  dual  task 
configurations  result  in  primary  task  performance  decrements,  it  is  misleading  to  suggest  this  will  be  the 
case  in  every  situation.  Rather,  it  is  our  intention  to  alert  human  factors  practitioners  to  conskfe*  <ne 
implications  of  this  analysis  with  respect  to  their  particular  situation.  This  is  Important  as  technological 
advances  have  increased  the  complexity  of  systems  such  that  operators  are  routinely  required  to  perform 
more  than  one  task  at  any  time.  The  secondary  task  paradigm  is  a  controlled  analog  of  this  situation. 

We  were  also  interested  in  identifying  any  trends  from  this  analysis  that  suggested  performance 
changes  that  are  sensitive  to  secondary  task  characteristics  as  a  function  of  primary  task  characteristics. 

Identification  of  Secondary  Task  Literature 

The  primary  reference  sources  used  to  identify  relevant  secondary  task  articles  published  prior  to 
i960  were  Ogden  ot  al.  (1979)  and  Wierwille  and  Williges  (1980).  For  relevant  articles  published  after 
1979,  we  identified  key  people  in  this  area  through  our  OWL  Information  System  database  and  sent 
requests  to  such  individuals  for  their  most  recent  articles  on  operator  workload.  We  also  searched  relevant 
journals  (e  g.,  Human  Factors)  and  proceedings  of  conferences  and  meetings  (e.g.,  NASMUnivorsIty 
Conferences  on  Manual  Control )  for  recent  studies.  Ao  a  result  of  this  effort,  we  were  able  to  obtain  147 
studies  for  review.  Of  these,  seven  were  excluded  from  our  analysis  because  they  lacked  sufficient 
information  to  interpret  their  results.  Four  studies  were  also  excluded  because  they  dealt  with  multiple 
task  batteries  in  which  no  attempt  was  made  to  examine  dual  task  cerformance.  We  were  left  with  136 
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artlcias  «nd  101  experiments  to  be  analyzed.  This  literature  base  is  a  comprehensively  representative 
sample  of  the  secondary  task  literature. 


CtessHtanton  Scheme  tor  Primary  and  Secondary  Tasks 


Classification  of  secondary  and  primary  tasks  characteristics  was  attempted  following  the  major 
classes  reported  in  the  literature  (O'Donnell  &  Egyemcier.  1986;  Ogden  et  al,  1979).  However,  due  to 
the  variety  of  secondary  and  primary  tasks  that  have  been  employed  in  studies,  it  was  necessary  to 
expand  previous  classification  schemes  as  wail  as  to  identify  particular  tasks  that  have  received  extensive 
use  (e.g.,  the  Sternberg  memory  task).  Listed  below  is  the  classification  scheme  we  developed  with 
descriptions  tor  each  category.  This  list  represents  the  entire  range  of  26  tasks  that  we  were  able  to 
identify  tor  secondary  and  primary  task  characteristics  based  on  our  review. 

•  Choice  Reaction  Time  Task  -  the  subject  is  presented  with  more  than  one  stimulus 
and  must  generate  a  different  response  for  each  one.  Visual  or  auditory  stimuli  may 
be  employed  and  tne  response  mode  is  usually  manual.  It  is  theorized  that  choice 
reaction  time  Imposes  both  central  processing  and  response  selection  demands. 

•  Simple  Reaction  Time  Task  -  the  subject  is  presented  with  one  discrete  stimulus 
(either  visual  or  auditory)  and  generates  one  response  to  this  stimulus,  minimizing 
central  processing  and  response  selection  demands. 

•  Driving  Task  -  the  subject  operates  a  driving  simulator  or  actual  motor  vehicle.  Such 
a  task  involves  corrplex  psychomotor  skills. 

•  Randomization  Task  -  the  subject  must  generate  a  random  sequence  of  numbers, 
for  example.  It  is  postulated  that  with  increased  workload  levels  subjects  will  generate 
repetitive  responses  (i.e.,  lack  randomness  in  responses). 

•  Tracking  Task  -  the  subject  must  foilow  or  track  a  visual  stimulus  (target)  which  is 
either  stationary  or  moving  by  means  ot  positioning  an  error  cursor  on  the  stimulus 
using  a  continuous  manual  response  device.  Central-processing  and  motor 
demands  vary  depending  on  the  order  ot  control  dynamics  for  the  device  used  by  the 
subject  to  control  the  error  cursor. 

•  Monitoring  Task  -  the  subject  Is  required  to  maintain  attention  to  a  visual  display  and 
to  detect  the  occurrence  of  a  stimulus  (signal)  from  among  several  alternatives 
(neutral  events).  The  task  is  not  intermittent  but  continuous.  Monitoring  tasks  are 
generally  assumed  to  impose  a  heavy  load  on  perceptual  processes. 

•  Time  Estimation  Task  -  the  subject  keeps  track  of  time  eitlier  by  generating  a  specific: 
time  interval  or  by  estimating  the  duration  of  a  time  interval  at  its  conclusion.  Typically, 
subjects  are  required  to  generate  10  second  time  intervals  (time  production 
procedure)  and  it  Is  assumed  under  high  workload  conditions  that  subjects  will 
underestimate  the  passage  of  time  as  reflected  by  their  responses  (I.e.,  longer  time 
estimates). 

•  Memory  Task  --  there  are  a  variety  of  memory  tasks  which  employ  a  number  of 
different  types  of  materials  and  specific  requirements.  For  example,  the  subject  is 
required  to  recall  in  any  order  a  list  of  words  previously  memorized  (free  recall 
paradigm)  or  is  required  to  recognize  previously  memorize  words  from  a  list  of  words 
(recognition  recall  paradigm).  These  tasks  are  typically  assumed  to  impose  heavy 
demands  on  central-processing  resources. 
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•  Manta!  Mathematics  Task  -  the  subject  must  perform  mental  arithmetic  operations 
such  as  addition,  subtraction,  and  multiplication.  These  tasks  are  generally 
considered  to  place  heavy  demands  on  central-processing  resources. 

•  Michon  Interval  Production  Taak  -  the  Mlchon  paradigm  of  interval  production 
requires  the  subject  to  generate  a  series  of  regular  time  Intervals  by  executing  a  motor 
response  (i.e.,  a  single  finger  tap  every  2  sec.).  No  sensory  input  is  required  This 
task  is  thought  to  impose  heavy  demand  on  motor  output/resportse  resources,  tt  has 
been  demonstrated  with  high  demand  primary  tasks  that  subjects  exhblt  Irregular  or 
variable  tapping  rates. 

•  Sternberg  Memory  Teak  ~  the  Sternberg  memory  task  is  a  commonly  used  memory 
task.  The  subject  is  presented  with  a  set  of  digits  or  letters  to  memorize. 
Subsequently,  the  subject  is  presented  with  a  test  digit  or  letter  and  must  judge 
whether  this  digit  was  contained  in  the  previous  memorized  set.  tt  is  theorized  that 
the  Sternberg  memory  task  aids  in  workload  assessment  by  distinguishing  between 
primary  task  central  processing  effects  from  primary  task  stimulus  encoding/response 
execution  effects. 

•  Lexical  Decision  Task  -  typically,  the  subject  is  briefly  presented  with  a  sequence  of 
letters  and  must  judge  whether  this  letter  sequence  forms  a  word  or  a  non-word.  This 
task  is  thought  to  impose  heavy  demands  on  semantic  memory  processes. 

•  Distraction  Task  -  the  subject  performs  a  task  which  is  executed  in  a  fairly  automatic 
way  such  as  counting  aloud.  Such  a  task  is  intended  to  distract  the  subject  in  order  to 
prevent  the  rehearsal  of  information  that  may  be  needed  for  the  primary  task. 

•  Problem  Solving  Task  -  tne  subject  engages  in  a  task  which  requires  verbal  or  spatial 
reasoning.  For  example,  the  subject  might  attempt  to  solve  anagram  or  logic 
problems.  This  class  of  tasks  Is  thought  to  impose  heavy  demands  on  central 
processing  resources. 

•  identification's hadowing  Taak  -  The  subject  identifies  changing  symbols  (digits 
and/or  letters)  that  appear  on  a  visual  display  by  writing  or  verbalizing,  or  repeating  a 
spoken  passage  as  it  occurs.  Such  tasks  are  thought  to  impose  demands  on 
perceptual  processes  (i.e.,  attention). 

•  Detection  Task  -  the  subject  must  detect  a  specific  stimulus  or  event  which  may  or 
may  not  be  presented  with  alternative  events.  For  example,  to  detect  which  of  4 
lights  is  flickering.  The  subject  is  usually  alerted  by  a  warning  signal  (e.g.,  tone) 
before  the  occurrence  of  such  events,  therefore  attention  is  required  Intermittently. 
Such  tasks  are  thought  to  impose  demands  on  peiceptua!  processes. 

•  Classification  Task  -  the  subject  must  judge  whether  symbol  pairs  are  identical  in 
form.  For  example,  to  match  letters  either  on  a  physical  level  (AA)  or  on  a  name  level 
(Aa).  Depending  upon  the  requirements  of  the  matching  task,  the  task  can  impose 
demands  on  perceptual  processes  (physical  match)  and/or  cognitive  processes 
(name  match  or  category  match). 

•  Psychomotor  Task  -  the  subject  must  perform  a  psyehor.iotor  task  such  as  sorting 
different  types  of  metal  screws  by  size.  Tasks  of  this  nature  are  thought  to  reflect 
psychomotor  skills. 

»  Spatial  Transformation  Task  -  .he  subject  must  judge  whether  Information  (data) 
provided  by  an  instrument  panel  or  radar  screen  -  matches  information  which  is 
spatially  depicted  by  pictures  or  drawings  of  aircraft.  This  task  involves  perceptual  and 
cognitive  processes. 
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•  Speed  Maintenance  Task  -  the  subject  must  operate  a  control  knob  to  maintain  a 
designated  constant  speed.  This  task  Is  a  psychomotor  type  task. 

•  Production/Handwriting  Task  -  the  subject  is  required  to  produce  spontaneous 
handwritten  passages  of  prose.  With  primary  tasks  that  impose  a  high  workload, 
subject's  handwriting  is  thought  to  deteriorate  (i.e. ,  semantic  and  grammatical  errors) 
under  such  conditions. 

•  Card  Sorting  Task  -  the  subject  must  sort  playing  cards  by  number,  color,  and/or 
suite.  Depending  upon  the  requirements  of  the  card  sorting  rule,  the  task  can 
impose  demands  on  perceptual  and  cognitive  processes. 

•  Three  Phase  Code  Transformation  Task  -  the  subject  operates  the  3P-Cotran  which 
is  a  workstation  consisting  of  three  indicator  lights,  a  response  board  lor  subject 
responses  and  a  memory  unit  that  the  subject  uses  to  save  his/her  responses.  The 
subject  must  engage  in  a  3  phase  problem  solving  task  by  utilizing  information 
provided  by  the  indicator  lights  and  recording  solutions  onto  the  memory  unit,  it  is  a 
synthetic  work  battery  used  to  study  work  behavior  and  sustained  attention. 

•  Multi-Task  Performance  Battery  (MTPB)  -  the  subject  operates  a  workstation 
consisting  of  display  panels  and  response  control  panels  for  six  different  tasks 
(choice  RT,  monitoring,  mental  math,  identification,  problem  solving,  and  tracking). 
The  task  battery  is  designed  to  involve  perceptual,  cognitive,  stimulus  encoding,  and 
response  selection  processes. 

•  Occlusion  Task  -  the  subject's  view  of  a  visual  display  is  obstructed  (usually  by  a 
visor).  These  obstructions  are  either  Initiated  by  the  subject  or  imposed  by  the 
experimenter  in  order  to  determine  the  viewing  time  needed  to  perform  a  task 
adequately. 

•  Simulated  Flight  -  the  flight  simulators  used  in  the  studies  that  were  part  of  our 
analysis  were  typically  commercially  available  training  simulators  (e  g.,  Singer-Link 
GAT-i  B).  Depending  on  the  purpose  of  the  particular  study,  the  subject  was 
required  to  perform  various  maneuvers  (e.g.,  landing  approaches)  undor  different 
types  of  conditions  such  as  instalment  flight  ajles  or  simulated  crosswirrJ  conditions. 


Measures  Used  with  Primary  and  Secondary  Tasks 

The  complexity  of  the  results  found  with  different  secondary  and  primary  task  pairings  can  be 
partly  attributed  to  the  different  and  numerous  performance  measures  that  have  been  used  with  these 
tasks.  Also,  studies  which  have  used  the  same  primary  and  secondary  task  pairings  have  either  used 
different  measures  of  performance  or  reported  different  results  from  the  same  task  measures.  In  Table  A- 
1 ,  we  have  listed  several  of  the  frequently  reported  measures  that  have  been  recorded  with  primary  and 
secondary  tasks.  These  measures  are  organized  according  to  the  task  classification  just  presented. 
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Table  A-f .  Measures  utilized  to  quantity  performance  on  primary  and  secondary  tasks. 


r - - 

TASK 

MEASURE 

Ctxnc?  Reaction  Time  Task 

Mean  (median)  RT  lor  correct  responses 

Mean  (median)  RT  (or  incorrect  responses 

Number  (%)  correct  responses 

Number  {%)  incorrect  responses 

Simple  Reaction  lime  Task 

Mean  (median)  RT  for  correct  responses 

Number  (%)  correct  responses 

Driving  Task 

Total  time  to  complete  a  trial 

Number  of  acceleration  rate  changes 

Number  of  gear  changes 

Number  of  Tootbrake  operations 

Number  of  steering  reversals 

Number  of  obstacles  hit 

High  pass  steering  (standard)  deviation 

Yaw  (standard)  deviation 

Lateral  (standard)  deviation 

Randomization  Task 

%  redundancy  score  (bits  of  information) 

Tracking  Tasks 

Integrated  errors  in  volts  (root  mean  square  error) 
Total  time  on  target 

Total  time  of  target 

Number  of  times  of  target 

Number  of  target  hits 

Monitoring  Tasks 

Number  (%)  of  correct  detections 

Number  (%)  of  incorrect  detections 

Number  (%)  of  errors  of  omission 

Mean  (median)  RT  for  correct  detections 

Mean  (median)  RT  for  incorrect  detections 

Memory  Tastes 

Mean  (median!  RT  for  correct  responses 

Number  (%!  of  correct  responses 

Number  (%>  errors  of  omission 

Number  (%)  of  incorrect  responses 

Mental  Mathematics  Tasks 

Number  (%)  of  correct  responses 

Mean  (median)  RT  for  correct  responses 

Number  (%)  of  incorrect  responses 

Miction  Interval  Tapping  Task 

Mean  interval  par  trial 

Standard  deviation  of  interval  par  trial 

Sum  of  differences  between  successive 
intervals  per  minute  of  total  time 

Sternberg  Memory  Task 

Slopes  and  Intercepts  for  RT  data 
(See  memory  tasks) 

Lexical  Decision  Task 

Mean  RT  for  correct  responses 
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Table  A-i .  Measurer,  utilized  to  quantity  performance  on  primary  and  secondary  tasks  (Cont.). 


TASK 


MEASURE 


Problem  Solving  Tnk8  Number  (%)  of  correct  response* 

Number  (%)  of  incorrect  responses 
Mean  (Median)  RT  tor  correct  response* 


It^fficstkxVShKkM^  Teak  Number  of  words  oorrect/mimite 

Number  of  digits  spoken 

Mean  time  interval  between  spoken  digits 

Number  of  errors  of  om  lesion 


Detection  Tasks 

Oanitkxfton  raaka 

Peycbornotor  Taeks 
Spatial  Transformation  Tanks 

Oosiuston  Teaks 

Spcfttamous  WiMtsg 
Card  Sorting 

|  3P-Cotnan  T®sk 

|  MTTO 

Sanuiataci  Rtgh* 

ItwMieeT  i Auwsrr •  anstawecwu.wrowsr^ftsiWcav'Ti.  vacs 


Mean  RT  tor  correct  dswtcttora 
Number  (%)  cf  correct  J44ocltons 


Mean  (median)  RT  lor  physical  match 
Steen  (median)  RT  for  category  match 
Number  (%)  errors  for  physical  match 
Number  (%)  errors  for  category  match 


Number  of  completed  Items 


Mean  RT  for  correct  responses 
Number  (%)  cl  correct  responses 
Number  (%)  of  incorrect  responses 


Msan  voluntary  occlusion  time 
Percent  looking  iimw&ta!  time 


Number  of  semantic  end  grammatical  errors 


fiuiTt’P*:  of  cards  sorted 
Number  (%)  of  aroowest  response* 


Mean  (Median)  RT  for  different  phases  cf 
'copnnsr*  required 

Number  ?d  snots  (meets)  for  dtffnwant 
phtseas  of  r*ftp&«a*  required 


Moan  (median)  RT  for  correct  detections 
Numbei  (%)  of  correct  detections 
Numbs?  of  prcsrfctm*  attempted 


Number  of  vertical  accefors  ms 
Mean  mm  front  required  sJtwrd# 
Root  mean  square  focataw  error 
Root  me.fr  *quw*  glide- elope  error 
Number  of  oont.o!  movements 
Pitch  htah-pass  mesa  square 
Rol*  '  mean  square 
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Ara#^MScfMn^SorPitii^and8MCind^Tiai*ja 


In  order  to  Analyze  the  results  from  the  studies  reviewed,  we  established  conventions  to  provide  a 
framework  tor  tabulating  such  complex  findings.  We  worked  with  the  premise  that  it  was  important  to 
repcrt  an,  indication  that  dual  task  pairings  resulted  in  a  performance  decrement  on  one  or  both  of  the 
taftkc  fnfimwry  or  secondary).  Based  on  this  premise,  we  formulated  the  following  criteria  for  priorities  on 
the  results  reported  in  each  study 

•  Measures  which  revealed  differences  between  duaMask  and  single-task  performance 
were  tabulated  and  preferred  over  results  for  measures  which  showed  performance 
stability  for  the  tame  task. 

•  Measures  which  revealed  decrement  for  dual-task  versus  single-task  performance 
were  tubulated  and  preferred  over  results  for  measures  which  showed  performance 
enhancement  for  the  same  task. 

•  Measures  which  revealed  dual-task  performance  decrements  with  experimental 
manipulations  (i.e.,  different  sound  levels  of  notes,  different  levels  of  task  demand  for 
either  secondary  or  primary  tasks,  etc.)  were  tabulated  and  preferred  over  results  for 
measures  which  showed  no  effects  for  the  same  task. 

Antfytta  of  Secondary  Task  Uterku-* 

A  systematic  approach  was  undertaken  to  characterize  the  wealth  of  uwormafion  contained  in  the 
experiments  reviewed.  The  approach  involved  several  steps. 

We  first  characterized  the  studies  reviewed  according  to  the  primary  and  secondary  tasks  employed. 
Table  A-2  and  Table  A-3  contain  the  results  of  this  effort.  As  seen  In  Table  A-2,  it  is  evident  that  the 
majority  of  experiments  have  involved  a  select  number  of  secondary  tasks  The  first  four  secondary  tasks 
listed  in  Table  A-2  represent  over  50%  of  the  total  secondary  tasks  that  comprised  our  sample.  Similarly 
with  respect  to  primary  tasks,  the  first  three  tasks  Stead  in  r  A-$  ropreaent  50%  of  the  total  primary 
tasks  in  our  sample.  It  is  also  evident  in  «r«m inkg  Tafofo  A  3  that  a  srauB  pmantegs  of  the  astudieu 
reviewed  have  employed  primary  tasks  which  esn  he  characterized  as  realistic.  Tim?  ig,  primary  tasks 
typically  do  not  involve  multiple  sensory  Input  and  sevens!  typaws  of  operator  aettor  s  and  responses 
driving  and  simulated  flight).  Such  finding*  rsfiecs  the  academic  interests  of  '•searchers  who  have  utilized 
the  secondary  task  paradigm. 

We  further  characterized  the  arttekrc  according  to  tlw  particular  primary-secondary  cc.nhpuratk>!i 
employed.  This  was  accomplished  in  two  comptomentary  ways.  We  examined  secondary  tasks  with 
respect  to  the  various  primary  tasks  ihai  have  bsen  wms  In  essoeiatkw  with  each  secondary  task.. 
Similarly .  we  examined  primary  tasks  wit h  respect  to  the  various  secondary  tasks  that  have  been  used  in 
association  with  each  primary  task.  We  were  ifleretried  in  identifying  any  trends  in  the  results  across 
similar  dual  task  pairing  ejqperfmsrits  that  would  suggest  parttouNu  dual  tack  pairings  a»e  rm  advantageous 
for  the  operator  (i.e.,  perlormtsrx»  dacremerws  for  both  secondary  and  primary  tasks).  Attachment  i 
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Table  A-2.  Number  oi  experiments  to  utilize  secondary  tasks. 


Secondary  Task 


Monitoring  Tasks 
Memory  Tasks 
Choke  Reaction  Time  Tasks 
Mentef  Mathematics  Tasks 
Tracking  Tasks 
Simple  Reecdon  Time  Tasks 
Michan  interval  Production  Task 
Identllication  Tasks 
Pjodem  Solving  T  *sk« 

Time  EattmafJ'in  Tasks 
Detection  Tasks 
Sternberg  Memory  Tiwk 
Randomization  Task 


Number  of  Experiments 


41 

32 

25 

18 

12 

11 

11 

9 

8 

7 

6 

5 

5 


Occlusion  Tasks  4 

Psychomotor  Tasks  3 

Cart  Sorting  Tasks  2 

Spontaneous  Handwriting  Task  1 

Ahcrah  fcUnngaKott  Task  1 

Spatial  Tr.insiormatieri  Talks  1 

MTTP0  {monitoring  tanka)  1 

Distraction  Tasks  1 

TOTAL  203 


Tha  total  203  represents  the  instances  that  these  s*onds.*y  tasks 
used  In  experiment's.  Several  etutfes  used  more  than  one  secondary 
task  Ui  a  single  experiment.  The  total  203  is  based  on  180  experiments. 
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TaWe  A-3,  Number  ot  experiments  that  utilized  primary  tasks  with  secondary  tasks. 


Primary  Task 


Numbar  of  Experiments 


Tracking  Tasks 

48 

Memory  Tasks 

26 

Monitoring  Tasks 

23 

Choice  Reaction  Time  Tasks 

18 

Driving  Tasks 

17 

Simulated  Flight 

8 

Detection  Tasks 

7 

Problem  SoMng  Tasks 

7 

Identification  Tasks 

5 

Classification  Tasks 

4 

Mental  Mathematics  Tasks 

4 

Simple  Reaction  Time  Tasks 

3 

Psychomotor  Tapping  Tasks 

3 

Card  Sorting  Tacks 

1 

Spatial  Transformation  Task 

1 

Sternberg  Memory  Task 

1 

Distraction  Tasks 

1 

Lexical  Decision  Tank 

1 

3P  Contran  Task 

1 

MTPB 

1 

Psychomotor  Tasks 

1 

TOTAL 

181 

The  total  181  repro&ents  the  experiments  reported  In  the  147  articles 
that  we  reviewed  for  our  analysis. 

contains  the  results  from  the  analysis  of  indlvWua!  seconday  tasks  each  paired  with  various  primary  tasks. 
Attachment  2  contains  the  results  from  the  complementary  analysis  of  individual  primary  tasks  each  paired 
with  various  secondary  tasks. 


Discussion  of  Secondary  Task  Analysis  Results 

Perusal  of  A-4  reveals  the  complex  nature  of  the  results  that  have  been  reported  with  various 
p  rime  ryseoor  da  ry  task  pairings.  With  respect  to  practical  considerations,  the  results  depicted  in  Table  A-4 
reflect  the  need  on  ti.e  part  of  human  factors  practitioners  to  examine  the  specific  demands  placed  on 
operators  whenever  there  are  system  requirements  to  perform  several  tasks  at  once.  Tnis  point  is 
WusJrated  by  examining  the  major  dasees  of  secondary  tasks  that  are  depicted  in  Table  A-4. 

Inspection  of  monitoring  secondary  task  experiments  reveal  several  Interesting  trends.  Wth  monitoring- 
monitoring  dual  task  pairings,  performance  on  the  primary  task  seems  to  decline  consistently  across 
experiments  with  one  exception.  A  somewhat  similar  finding  is  shown  in  these  same  experiments  with 
respect  to  the  monitoring  tasks  designated  as  secondary  tasks.  As  these  experiments  did  not  place  a 
greater  emphasis  on  either  prLtary  or  secondary  monitoring  tasks  with  respect  to  maintaining  performance 
levels  such  findings  are  possibly  duo  to  the  high  task  demands  that  two  monitoring  tasks  combined 
placed  on  perceptual  processes.  When  one  examines  the  tracking-monitoring  dual  task  pairing  results,  a 
somewhat  different  picture  emerges.  The  primary  tracking  task  results  exhibit  across  experiments  an 
almost  equal  split  between  stable  performance  and  degraded  performance.  Hov fever,  the  experiments 
that  reported  degraded  performance  for  the  primary  tracking  task  aU  placed  equal  emphasis  on  both 
primary  and  secondary  performance.  This  may  have  contributed  to  subjects’  poor  performance  on 
the  tracking  tasks  because  subjects  may  have  formed  inappropriate  strategies  for  handling  the  dual  task 
pairing.  While  those  experiments  that  reported  primary  tracking  task  performance  as  stable,  a  greater 
emphasis  was  placed  on  tracking  performance  for  three  out  of  seven  experiments  listed  In  this  category. 
With  respect  to  the  monitoring  secondary  task  results  in  this  dual  task  configuration.performance  appears 
stable  when  the  monitoring  task  is  auditory  in  nature.  But  whari  the  monitoring  task  is  visual,  six  out  of 
seven  experiments  reported  a  performance  decrement  on  the  monitoring  portion.  Such  findings  seem  to 
indicate  that  visual  tracking-visual  monitoring  dual  task  pairing  can  lead  to  performance  decrements.  This 
is  the  case  especially  on  the  visual  monitoring  portion  probably  because  of  tire  combined  visual  toad 
placed  on  subjects  by  the  two  tasks. 

For  secondary  memory  task  experiments,  the  results  seem  to  exhibit  a  trend  across  experiments 
that  indicates  prior  experiertca  with  9  task  is  an  important  factor  for  dual  task  performance.  With  driving- 
memory  dual  task  situations,  only  the  memory  portion  revealed  a  performance  decrement.  As  these 
experiments  involved  experienced  drivers  with  greater  emphasis  placed  on  driving  performance,  those 
factors  probably  contributed  for  such  driving  stability  but  at  the  expense  of  the  memory  task.  In  contrast, 
tracking-memory  dual  task  situations  resulted  in  both  tasks  exhibiting  poor  performance  for  most 
experiments.  Tiiese  experiments  typically  involved  college  students  and  their  inability  to  perform  this  dual 
task  configuration  reflects  lack  of  experience,  even  though  researchers  try  to  provide  for  task  mastery. 
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With  respect  to  choice  reaction  time  secondary  task  experiments,  it  is  evident  that  the  dual  task 
pairing  consisting  of  tracking-choice  reaction  time  results  in  poor  performance  on  both  tasks  with  one 
exception.  The  complexity  of  this  dual  task  situation  (l.e.,  task  demands  on  central  processing,  response 
selection  and  motor  responses  for  subjects)  probably  contributes  to  such  poor  overall  performance. 
Similar  results  are  found  with  experiments  that  employed  mental  met  hematics  secondary  tasks.  As  seen  in 
Table  A-1 ,  dual  task  pairings  with  mental  mathematics  as  the  secondary  task  results  in  poor  performance 
on  both  tasks  for  most  experiments.  As  mental  mathematics  can  be  considered  a  relatively  complex  set  of 
cognitive  operations,  its  pairing  with  almost  any  primary  task  configuration  except  simple  tasks  (e  g., 
tapping)  or  highly  practiced  tasks  (e.g.,  driving)  results  In  poor  overall  dual  performance. 

The  above  descriptions  Illustrate  the  complex  results  found  with  all  secondary  task  studies.  The 
results  reflect  the  complex  interactions  between  the  salient  factors  that  influence  performance  in  dual  task 
situations. 


Conclusions 

The  complexity  of  the  results  just  described  may  seem,  at  first,  to  be  beyond  simple  conclusions 
or  implications.  However,  several  important  issues  can  be  derived  concerning  the  use  of  secondary  tasks 
as  an  OWL  technique  and  dual  task  performance  in  general. 

With  respect  to  secondary  tasks  as  a  workload  estimation  technique,  the  results  described  show 
that  secondary  tasks  can  interfere  with  primary  task  performance.  As  a  result,  inferences  concerning  spare 
capacity  with  a  primary  task  become  difficult  to  interpret.  A  solution  to  this  problem  is  to  employ  tasks  as 
secondary  tasks  that  are  inherently  part  of  a  multitask  system.  Under  these  circumstances,  a  wealth  of 
information  is  gained  even  though  the  primary  task  may  show  performance  changes.  Because  any 
change  in  performance,  whether  on  the  designated  secondary  or  primary  task  of  a  system,  provides 
valuable  insight  concerning  the  operator's  capabilities  and  limits  in  using  the  system.  It  is  for  this  reason 
that  the  embedded  secondary  task  technique  is  offered  in  Chapter  6  as  the  technique  of  choice  in  system 
design  and  development  environments. 

Another  important  implication  that  is  derived  from  our  analysis  is  that  secondary  tasks  can  result  in 
changes  in  primary  task  performance  that  seem  to  be  reflective  of  subjects'  inappropriate  strategies  with 
respect  to  the  dual  task  situation.  Subjects'  performance  on  the  primary  task  seems  to  be  degraded 
because  the  introduction  of  the  secondary  task  has  changed  the  nature  of  the  situation  with  respect  to 
primary  task  demands,  if  your  interests  are  to  quantify  the  spare  capacity  with  respect  to  a  primary  task, 
then  such  changes  are  clearly  troublesome.  To  prevent  these  possbie  circumstances,  it  is  necessary  to 
use  secondary  task  techniques  that  do  not  intrude  on  primary  task  performance.  Several  secondary  task 
techniques  are  offered  Chapter  6  that  minimize  this  potential  confounding.  They  have  been 
demonstrated  in  some  applied  settings  not  to  intrude  on  operators'  performance  with  complex  systems 
and  to  be  sensitive  to  workload  levels  on  such  systems. 
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For  dual  task  performance,  the  secondary  task  literature  provides,  though  in  some  cases 
unintentionally,  valuable  information  on  the  critical  factors  that  hinder  multi-task  performance.  These 
factors  are: 

•  inappropriate  operator  strategies  with  respect  to  meeting  the  task  demands  of  several 
tasks  at  once, 

•  the  potency  of  certain  types  of  tasks  (o.g.,  mental  mathematics)  to  hinder  the  ability  of 
operators  to  perform  any  additional  tasks  that  may  be  required,  and 

•  the  combined  task  demand  effects  of  certain  task  configurations  (e.g.,  monitoring¬ 
monitoring)  are  such  that  they  overtoad  the  operator  when  performed  together  . 

The  human  factors  practitioner  needs  to  be  aware  of  these  factors  in  order  to  ensure  that  performance  on 
complex  systems  does  not  suffer  from  such  factors. 
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APPENDIX  A  -  ATTACHMENT  1 


Secondary  Talk  Experiments  with  iFSeepect  lo  Secondary  Tusk  Characteristics 


Attachment  1  is  shown  on  the  following  pages  and  contains  the  results  from  the  analysis  ot 
Individual  primary  tasks  with  respect  to  secondary  task  pairings.  The  table  is  organized  in  the  following 
manner: 

•  The  particular  primary  task  examined  is  identified  in  the  far  left-hand  column  header  of 
each  page.  It  is  indicated  with  the  letter  "P"  preceding  the  primary  task  characteristic, 
for  example  P-monitor. 

•  The  secondary  task  pairings  associated  with  the  particular  primary  task  are  listed  below 
the  primary  task  header  in  the  lar  left-hand  column, 

•  The  experiments  that  employed  these  particular  dual  task  pairings  are  listed  by  first 
author  and  year  for  cited  article  reference.  They  appear  in  the  table  under  the 
appropriate  column  with  respect  to  the  results  for  the  primary  task  as  well  as 
secondary  task.  If  an  experiment  only  rfisports  the  results  for  eitf  the  primary  or 
secondary  task  then  the  experiment  Is  listed  only  once  under  the  appropriate  column 
for  the  results  reported. 

•  Based  on  the  conventions/rules  described  in  Appendix  B,  experiments  are  listed 
under  the  appropriate  column  headers  as  follows: 

P-  signifies  primary  task  measures  were  stable  In  dual  task  pairings 

P  Down  signifies  primary  task  measure(s)  exhibited  a  decrement  in  dual  task  pairings 

P  Up  sigtHiias  primary  task  measure^}  exhibit  jd  ar.  increment  in  dual  task  pairings 

S-  signifies  secondary  task  measures  were  stable  in  dual  task  pairings 

S  Down  signifies  secondary  task  measure(s)  exhibited  a  decrement  in  dual  task 
pairings 

3  Up  signifies  secondary  task  moasure(s)  exhibited  an  Increment  in  dual  task  pairings 
°  Each  experiment  is  listed  in  the  following  sequential  manner: 

First  author's  last  name  only  for  the  cited  reference  article. 

For  example,  "Domic" 

If  one  author  then  the  author's  last  name  is  underlined. 

For  example,  "Domic" 

The  year  the  cited  reference  article  was  published. 

For  example,  Domic80 

If  the  cited  reference  article  contained  more  tiian  one  experiment  then  the  particular 
experiment  is  indicated  within  parenthesis. 

For  example,  Domic80(1) 
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The  primary  task's  mods  (or  stimulus  Information: 

V-visual  input 
A-auoitory  input 
T-cutaneous  input 

A+V-both  auditory  and  visual  simultaneously 
A/V-both  auditory  and  visual  but  not  simultaneously 
--  rot  appropriate 

For  example,  Dom*c8G(1)v 
The  secondary  task's  mode  tor  stimulus  information: 

V-visual  Input 
A-audltory  input 
T*cutaneous  input 

A+V»both  auditory  and  visual  simultaneously 
A/V-both  audilory  and  visual  but  not  simultaneously 
—not  appropriate 

For  example,  Domic80(1)va 

The  emphasis  placed  on  maintaining  performance  for  either  primary  or  secondary 
tasks  during  dual  task  pairings  as  spectiically  stated  In  the  article  or  implied  by  payoff 
matrices  (e.g.,  $10  for  high  scores  on  the  primary  task). 

P-primary  task  emphasized 

S-secondary  task  emphasized 

Blank-both  secondary  and  primary  are  emphasized  or  the  authors  do  not  specifically 
state  the  performance  emphasis  placed  on  subjects  therefore  assumed  equal 
emphasis  for  both  primary  and  secondary  tasks 

For  example,  Domic80(1)vap 

if  the  experiment  contained  data  that  allowed  the  determination  that  either  the 
secondary  or  primary  task  performance  changed  (i.e.,  increment  or  decrement)  or  was 
stable  in  dual  task  pairings  but  was  not  specifically  addressed  by  the  authors,  this  is 
indicated  under  the  appropriate  primary  or  secondary  result  column  header  as 
interpolated. 

For  example,  Domic80(1)vaplntrp 
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AW»tKDIXA.  ATfACMME^TJ? 


8*md*ry  rswk  ffycp*dm*nfs  with  rumpect;  fcr  Prtntnry  Task  CtaracMrttttc* 


Attachment  2  is  shown  on  the  following  pages  arvi  cor  tains  the  results  from  the  analysis  ot 
individual  primary  tasks  with  respect  to  sooonriary  task  pairings.  The  table  is  organised  in  tl«e  following 
manner: 


*  The  particular  primary  task  examined  is  identified  in  the  far  left  hand  column  header  of 
each  page.  It  is  Indicated  with  the  letter  "P*  preceding  the  primary  task  characteristic, 
fur  example  P-monitor. 

*  The  seuendei  y  task  pairings  associated  with  the  particular  primary  task  are  feted  bekjw 
the  primary  task  header  In  hie  far  left  -tvind  column. 

*  The  experiments  that  employed  tnuso  particular  Aral  task  pairings  are  listed  by  first 
author  and  yea;  tor  cited  article  reference,  'they  appear  in  the  table  under  the 
appropriate  column  with  respect  to  the  results  for  the  primary  task  as  well  as 
secondary  task,  if  an  experiment  only  reports  the  results  for  either  the  primary  or 
secondary  task  then  the  experiment  is  listed  -.nty  once  under  the  appropriate  column 
for  the  results  reported. 

<  Based  on  the  convent ionv'rules  described  in  Appendix  B  0x4*1  rime  nt*  are  listed 
under  the  appropriate  column  headers  as  follows: 

P-  signifies  primary  task  measures  were  stnt<te  in  dual  task  pairings 

P  Town  signifies  primary  task  mearjre(s)  awhWted  a  decrement  in  dual  task  pairings 

P  Up  signK&c  primary  task  measure^)  uxhfcltsd  an  increment  hr  dual  fast  pairings 

S-  sigri^s  secondary  task  measures  were  sable  In  dual  task  pairings 

S  Down  signfies  secondary  task  measure^)  exhibited  a  decrement  In  dual  task 
pairings 

S  Up  pignHteB  sew>ndaiy  task  measured)  exhfrrifod  an  Increment  h  dual  task  pairings 
Each  experiment  is  listed  In  the  following  sequential  manner: 

Fii‘«r  authors  last  name  only  for  tho  cited  reference  article. 

For  example,  "Domic" 

If  one  author  than  the  author's  last  name  Is  under  .inert. 

For  example,  "Domic" 

/he  year  the  cited  reference  article  was  published. 

Fcroxanrptc,  DomicSO 

If  the  cited  reference  article  contained  more  than  one  experiment  then  trio  pirticuier 
experiment  is  indicated  vritiun  pareritteate. 

Forexa.iiple,  Dcmic80(l) 
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The  primary  uwk‘8  mode  to:  stimulus  !*dormwiton: 

V- visual  input 
A-audrtory  Input 
T»cutaneoya  input 

A+V-both  auvlitory  and  visual  simultaneously 
rW-both  auultory  and  visual  but  not  simultaneously 
--  not  appropriate 

For  example,  Dornic8C(1)v 
The  secondary  task's  mode  for  stimulus  Information: 

V-visual  Input 
A-auditory  input 
T-cutaneous  input 

A+V-both  auditory  and  visual  simultaneously 
A/V-both  auditory  and  vistjial  but  not  simultaneously 
—not  appropriate 

For  example,  Dor.iic80(  i)va 

The  emphasis  placed  on  maintaining  performance  for  either  primary  or  secondary 
tasks  during  dual  task  pairings  as  specifically  stated  In  the  article  or  Implied  by  payoff 
matrices  (e  g.,  $10  for  high  scon  s  on  the  primary  task). 

P*primary  task  emphasized 

5-secondary  task  emphasized 

Siankohoth  secondary  and  primary  are  emphasized  or  the  authors  do  not  specifically 
state  the  performance  emphasis  placed  on  subjects  therefore  assumed  equal 
emphasis  for  both  primary  and  secondary  tasks 

For  example,  Domfc80(l)vap 

If  the  experiment  contained  data  that  allowed  the  determination  that  either  the 
secondary  or  primary  task  performance  changed  (I.®.,  Increment  or  decrement)  or  was 
stable  in  dual  task  pairings  but  was  not  specifically  addressed  by  the  authors,  this  is 
Kidicatad  under  the  appropriate  primary  or  secondary  result  column  header  as 
interpolated. 

For  example,  Domic80(1)vapintrp 
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